The Shortest Common Supersequence (SCS) is finding the shortest supersequence Z of given sequences X and Y such that both X and Y are subsequences of Z.

For example,

Consider the two following sequences, X and Y:
X: ABCBDAB
Y: BDCABA
 
The length of the SCS is 9
SCS are ABCBDCABA, ABDCABDAB, and ABDCBDABA

Practice this problem

1. Finding Length of Shortest Common Supersequence

The shortest common supersequence problem is closely related to the Longest Common Subsequence (LCS) problem. The length of the shortest common supersequence of two subsequences can be easily derived from the length of their longest common subsequence. For given two sequences X and Y of length m and n, respectively, the relation is:

SCSLength(X, Y) = m + n – LCSLength(X, Y)

This relation works as LCS detects common characters present in the string, and in order for a common supersequence to be of minimal length, the common characters should be present in it only once. So, the length of SCS is equal to the sum of both strings minus their LCS length. This is demonstrated below in C++, Java, and Python:

C++


Download  Run Code

Output:

The length of the shortest common supersequence is 9

Java


Download  Run Code

Output:

The length of the shortest common supersequence is 9

Python


Download  Run Code

Output:

The length of the shortest common supersequence is 9

2. Finding the Shortest Common Supersequence

For two input sequences, an SCS can be easily formed from the Longest Common Subsequence (LCS) problem. The idea is to construct an SCS by considering the non-lcs characters while preserving the character order.

For example, consider two sequences abcbdab and bdcaba having LCS equal to bcba. By inserting the non-lcs characters and preserving the character order, we get the string abdcabdab, our SCS.

How can we achieve this?

The idea is to fill the lookup table of LCS by finding the length of LCS of given sequences, X[0…m-1] and Y[0…n-1] in a bottom-up manner. Then recursively find the SCS in a top-down manner using values in the lookup table, i.e., follow the path from the bottom-right corner of the lookup table to its top-left corner when reading out an SCS.

  • If the current characters of X and Y are equal, they are part of the SCS, and we move diagonally in the lookup table (i.e., recur to find SCS of sequence X[0…m-2], Y[0…n-2]).
  • If the current characters of X and Y are different, go up or left, depending on which cell has a higher number.
    1. If a top cell of the current cell has more value than the left cell, include the current character of sequence X in SCS and recur to find SCS of sequence X[0…m-2], Y[0…n-1].
    2. If a left cell of the current cell has more value than the top cell, include the current character of sequence Y in SCS and recur to find SCS of sequence X[0…m-1], Y[0…n-2].
  • If the end of either sequence is reached, then the other sequence forms part of the SCS.

 
Following is the implementation of the above approach in C++, Java, and Python:

C++


Download  Run Code

Output:

The shortest common supersequence of ABCBDAB and BDCABA is ABCBDCABA

Java


Download  Run Code

Output:

The shortest common supersequence of ABCBDAB and BDCABA is ABCBDCABA

Python


Download  Run Code

Output:

The shortest common supersequence of ABCBDAB and BDCABA is ABCBDCABA

The time complexity of the above solution is O(m.n) and requires O(m.n) extra space, where m is the length of the first string and n is the length of the second string.

3. Finding All Shortest Common Supersequence

The shortest common supersequence for two strings is not unique, and the above code doesn’t generate all SCS of the given sequences. It will always print one SCS. The problem with the above code is when the last characters of X and Y are different, we either go up or left in the lookup table depending upon which side has a larger value.

How can we modify the code so it prints all possible SCS?

The idea remains the same except if the top cell is equal to the left cell, consider both top and left directions; otherwise, go in the direction of greater value.

In other words, if the last characters of X and Y are not the same, then SCS can be constructed from either the top side or from the left side of the lookup table, depending upon which side has a larger value. If both the values are equal, then SCS can be constructed from both directions in the lookup table. So, we go in the direction of a larger value or go in both directions if the values are equal.

Following is the C++, Java, and Python implementation based on the above idea:

C++


Download  Run Code

Output:

All shortest common supersequence of ABCBDAB and BDCABA are:
 
ABCBDCABA
ABDCABDAB
ABDCBDABA

Java


Download  Run Code

Output:

All shortest common supersequence of ABCBDAB and BDCABA are: [ABDCBDABA, ABDCABDAB, ABCBDCABA]

Python


Download  Run Code

Output:

All shortest common supersequence of ABCBDAB and BDCABA are: {‘ABCBDCABA’, ‘ABDCABDAB’, ‘ABDCBDABA’}