0

I want to find the difference between two strings in java. The difference should be close to any file comparison tools. I have used longest common subsequence algorithm but still it is not pointing the exact expected difference. Any help with this regard will be much appreciated.

kad
  • 49
  • 2
  • 5
  • and how do you define this difference? – Shamim Hafiz - MSFT Jan 12 '12 at 11:29
  • 1
    possible duplicate of [How to perform string Diffs in Java?](http://stackoverflow.com/questions/132478/how-to-perform-string-diffs-in-java) – dogbane Jan 12 '12 at 11:30
  • 1
    That's a pretty vague requirement. Are you looking at the levenshtein distance (http://en.wikipedia.org/wiki/Levenshtein_distance)? Aaache commons lang has an implementation: http://commons.apache.org/lang/api-3.1/org/apache/commons/lang3/StringUtils.html#getLevenshteinDistance%28java.lang.CharSequence,%20java.lang.CharSequence%29 – JB Nizet Jan 12 '12 at 11:30
  • @kad: you may want to look at my accepted answer here, on how to compute a distance (like a Levenhstein Edit Distance) on large strings (works for small strings too of course): http://stackoverflow.com/questions/8780215/c-sharp-code-or-algorithm-to-quickly-calculate-distance-between-large-strings/8780414#8780414 There's a link to the (optimized) DP of Levenhstein in the comments to my answer. – TacticalCoder Jan 12 '12 at 11:58
  • This question is just too vague in its current form, please clarify your exact requirements then flag to re-open. Thanks. – Kev Jan 12 '12 at 12:23

2 Answers2

4

Maybe you may use the Levenstein algorithm.

http://en.wikipedia.org/wiki/Levenshtein_distance

user1145202
  • 147
  • 4
0

I don't think it's trivial to write yourself, so best use a library such as this: http://code.google.com/p/google-diff-match-patch/ Which is available in all sorts of languages including Java.

If I wanted to figure out how it works, I'd first look at Myer's diff algorithm (which google-diff-match-patch says is the state-of-the-art): http://neil.fraser.name/software/diff_match_patch/myers.pdf

Evgeni Sergeev
  • 22,495
  • 17
  • 107
  • 124