4

The ever-helpful Wikipedia claims that diff implements Longest Common Subsequence.

This cannot be so. Diff, at least in -y mode, has three types of report: add, remove, and substitute. LCS does not have any concept of 'substitute'.

What is the algorithm of diff? I have reason to not believe that it is Levenshtein distance, but I might have misanalyzed this.

bmargulies
  • 97,814
  • 39
  • 186
  • 310
  • Can't an insertion and a deletion next to each other be considered a substitution? – Max Shawabkeh Feb 11 '10 at 13:35
  • The corresponding source code uses only add and remove. Looks like longest common subsequence at first glance... (See http://git.savannah.gnu.org/cgit/diffutils.git/tree/src/analyze.c?id=fecd0079fe6e15b0f53bf953721d838d9099bf05) – mre Feb 11 '10 at 14:48
  • @mre, diff -y produces 'common' lines indicated by vertical bars. – bmargulies Feb 11 '10 at 16:25
  • I see. My assumption was wrong but at least I learned something along the way :) – mre Feb 11 '10 at 16:49

1 Answers1

2

This answer (by ioplex) says that GNU diff implements "O(ND) diff algorithm" by Eugene Myers.

Community
  • 1
  • 1
P Shved
  • 96,026
  • 17
  • 121
  • 165