Implement your diff utility, i.e., given two similar strings, efficiently list out all differences between them.

The diff utility is a data comparison tool that calculates and displays the differences between the two texts. It tries to determine the smallest set of deletions and insertions and create one text from the other. Diff is line-oriented rather than character-oriented, unlike edit distance.

 
For example,

Input:
 
string X = XMJYAUZ
string Y = XMJAATZ
 
Output:
 
X M J -Y A -U +A +T Z
 
(- indicates that character is deleted from Y but it was present in X)
(+ indicates that character is inserted in Y but it was not present in X)

We can use the Longest Common Subsequence (LCS) to solve this problem. The idea is to find the longest sequence of characters present in both original sequences in the same order. From the longest common subsequence, it is only a small step to get the diff-like output:

  • If a character is absent in the subsequence but present in the first original sequence, it must have been deleted (indicated by the - marks).
  • If it is absent in the subsequence but present in the second original sequence, it must have been inserted (indicated by the + marks).

Following is the C++, Java, and Python implementation of the idea:

C++


Download  Run Code

Output:

A B C D +E F G -H +I J -Q +K +R +X +Y Z

Java


Download  Run Code

Output:

A B C D +E F G -H +I J -Q +K +R +X +Y Z

Python


Download  Run Code

Output:

A B C D +E F G -H +I J -Q +K +R +X +Y Z

The time complexity of the above solution is O(m.n) and requires O(m.n) extra space, where m is the length of the first string and n is the length of the second string.

 
Exercise: Modify above solution to find differences by reading the lookup table in a bottom-up manner.

 
References: https://en.wikipedia.org/wiki/Diff_utility