I want to find the difference between two strings in java. The difference should be close to any file comparison tools. I have used longest common subsequence algorithm but still it is not pointing the exact expected difference. Any help with this regard will be much appreciated.
Asked
Active
Viewed 457 times
0
-
and how do you define this difference? – Shamim Hafiz - MSFT Jan 12 '12 at 11:29
-
1possible duplicate of [How to perform string Diffs in Java?](http://stackoverflow.com/questions/132478/how-to-perform-string-diffs-in-java) – dogbane Jan 12 '12 at 11:30
-
1That's a pretty vague requirement. Are you looking at the levenshtein distance (http://en.wikipedia.org/wiki/Levenshtein_distance)? Aaache commons lang has an implementation: http://commons.apache.org/lang/api-3.1/org/apache/commons/lang3/StringUtils.html#getLevenshteinDistance%28java.lang.CharSequence,%20java.lang.CharSequence%29 – JB Nizet Jan 12 '12 at 11:30
-
@kad: you may want to look at my accepted answer here, on how to compute a distance (like a Levenhstein Edit Distance) on large strings (works for small strings too of course): http://stackoverflow.com/questions/8780215/c-sharp-code-or-algorithm-to-quickly-calculate-distance-between-large-strings/8780414#8780414 There's a link to the (optimized) DP of Levenhstein in the comments to my answer. – TacticalCoder Jan 12 '12 at 11:58
-
This question is just too vague in its current form, please clarify your exact requirements then flag to re-open. Thanks. – Kev Jan 12 '12 at 12:23
2 Answers
0
I don't think it's trivial to write yourself, so best use a library such as this: http://code.google.com/p/google-diff-match-patch/ Which is available in all sorts of languages including Java.
If I wanted to figure out how it works, I'd first look at Myer's diff algorithm (which google-diff-match-patch says is the state-of-the-art): http://neil.fraser.name/software/diff_match_patch/myers.pdf

Evgeni Sergeev
- 22,495
- 17
- 107
- 124
-
The Levenstein distance is actually pretty easy to write yourself! – Michael Berry Jan 12 '12 at 11:40
-
1It is, but the straightforward implementation yields a quadratic algorithm, so it will be slow for strings of 100 KB or more. Files can get larger than that. – Evgeni Sergeev Jan 12 '12 at 11:59
-