Longest Common Substring Problem
The longest common substring problem is the problem of finding the longest string (or strings) that is a substring (or are substrings) of two strings.
The problem differs from the problem of finding the Longest Common Subsequence (LCS). Unlike subsequences, substrings are required to occupy consecutive positions within the original string.
For example, the longest common substring of strings ABABC, BABCA is the string BABC having length 4. Other common substrings are ABC, A, AB, B, BA, BC, and C.
A naive solution would be to consider all substrings of the second string and find the longest substring that is also a substring of the first string. The time complexity of this solution would be O((m + n) × m2), where m and n are the length of the strings X and Y, as it takes (m+n) time for substring search, and there are m2 substrings of the second string. We can optimize this method by considering substrings in order of their decreasing lengths and return as soon as any substring matches the first string. But the worst-case time complexity remains the same when no common characters are present.
Can we do better?
The idea is to find the longest common suffix for all pairs of prefixes of the strings using dynamic programming using the relation:
| 0 (otherwise)
where,
0 <= i – 1 < m, where
m is the length of string X0 <= j – 1 < n, where
n is the length of string Y
For example, consider strings ABAB and BABA.

Finally, the longest common substring length would be the maximal of these longest common suffixes of all possible prefixes.
The following solution in C++, Java, and Python finds the length of the longest repeated subsequence of sequences X and Y iteratively using the optimal substructure property of the LCS problem.
C++
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
#include <iostream> #include <string> #include <cstring> using namespace std; // Function to find the longest common substring of sequences // `X[0…m-1]` and `Y[0…n-1]` string LCS(string X, string Y, int m, int n) { int maxlen = 0; // stores the max length of LCS int endingIndex = m; // stores the ending index of LCS in `X` // `lookup[i][j]` stores the length of LCS of substring `X[0…i-1]`, `Y[0…j-1]` int lookup[m + 1][n + 1]; // initialize all cells of the lookup table to 0 memset(lookup, 0, sizeof(lookup)); // fill the lookup table in a bottom-up manner for (int i = 1; i <= m; i++) { for (int j = 1; j <= n; j++) { // if the current character of `X` and `Y` matches if (X[i - 1] == Y[j - 1]) { lookup[i][j] = lookup[i - 1][j - 1] + 1; // update the maximum length and ending index if (lookup[i][j] > maxlen) { maxlen = lookup[i][j]; endingIndex = i; } } } } // return longest common substring having length `maxlen` return X.substr(endingIndex - maxlen, maxlen); } int main() { string X = "ABC", Y = "BABA"; int m = X.length(), n = Y.length(); // Find longest common substring cout << "The longest common substring is " << LCS(X, Y, m, n); return 0; } |
Output:
The longest common substring is AB
Java
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
class Main { // Function to find the longest common substring of sequences // `X[0…m-1]` and `Y[0…n-1]` public static String LCS(String X, String Y, int m, int n) { int maxlen = 0; // stores the max length of LCS int endingIndex = m; // stores the ending index of LCS in `X` // `lookup[i][j]` stores the length of LCS of substring // `X[0…i-1]`, `Y[0…j-1]` int[][] lookup = new int[m + 1][n + 1]; // fill the lookup table in a bottom-up manner for (int i = 1; i <= m; i++) { for (int j = 1; j <= n; j++) { // if the current character of `X` and `Y` matches if (X.charAt(i - 1) == Y.charAt(j - 1)) { lookup[i][j] = lookup[i - 1][j - 1] + 1; // update the maximum length and ending index if (lookup[i][j] > maxlen) { maxlen = lookup[i][j]; endingIndex = i; } } } } // return longest common substring having length `maxlen` return X.substring(endingIndex - maxlen, endingIndex); } public static void main(String[] args) { String X = "ABC", Y = "BABA"; int m = X.length(), n = Y.length(); // Find longest common substring System.out.print("The longest common substring is " + LCS(X, Y, m, n)); } } |
Output:
The longest common substring is AB
Python
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
# Function to find the longest common substring of sequences `X[0…m-1]` and `Y[0…n-1]` def LCS(X, Y, m, n): maxLength = 0 # stores the max length of LCS endingIndex = m # stores the ending index of LCS in `X` # `lookup[i][j]` stores the length of LCS of substring `X[0…i-1]` and `Y[0…j-1]` lookup = [[0 for x in range(n + 1)] for y in range(m + 1)] # fill the lookup table in a bottom-up manner for i in range(1, m + 1): for j in range(1, n + 1): # if the current character of `X` and `Y` matches if X[i - 1] == Y[j - 1]: lookup[i][j] = lookup[i - 1][j - 1] + 1 # update the maximum length and ending index if lookup[i][j] > maxLength: maxLength = lookup[i][j] endingIndex = i # return longest common substring having length `maxLength` return X[endingIndex - maxLength: endingIndex] if __name__ == '__main__': X = 'ABC' Y = 'BABA' m = len(X) n = len(Y) # Find longest common substring print('The longest common substring is', LCS(X, Y, m, n)) |
Output:
The longest common substring is AB
The time complexity of the above solution is O(m.n) and requires O(m.n) extra space, where m and n are the length of the strings X and Y, respectively. The space complexity of the above solution can be improved to O(n) as calculating LCS of a row of the LCS table requires only the solutions to the current row and the previous row. We can also store only non-zero values in the rows. We can do this using hash tables instead of arrays.
We can also solve this problem in O(m + n) time by using a generalized suffix tree. We will be soon discussing the suffix tree approach in a separate post.
Exercise: Write space optimized code for iterative version.
References: https://en.wikipedia.org/wiki/Longest_common_substring_problem
Thanks for reading.
To share your code in the comments, please use our online compiler that supports C, C++, Java, Python, JavaScript, C#, PHP, and many more popular programming languages.
Like us? Refer us to your friends and support our growth. Happy coding :)