All diff tools I've found are just comparing line by line instead of char by char. Is there any library that gives details on single line strings? Maybe also a percentage difference, though I guess there are separate functions for that?
-
1Isn't this a duplicate of http://stackoverflow.com/questions/1721738/using-diff-or-anything-else-to-get-character-level-diff-between-text-files ? – Aleksandr Levchuk May 07 '11 at 21:35
4 Answers
This algorithm diffs word-by-word:
http://github.com/paulgb/simplediff
available in Python and PHP. It can even spit out HTML formatted output using the <ins>
and <del>
tags.

- 109,858
- 19
- 140
- 171
-
Good, but whitespace should matter too. A tab replaced by a space would be a difference not picked up by this. – Tor Valamo Jan 09 '10 at 20:44
-
The source code looks simple enough. You can easily change it to split on empty string instead of whitespace so you can diff character-by-character. – slebetman Jan 09 '10 at 21:07
-
Actually this one works awesome, by passing the strings directly to diff() instead of through stringDiff(). Works nicely on a char by char basis, because strings are sequences in python. And the output of the function is easy to work with too. I'm wondering about the overhead of looking for largest common substring though, when each item is only one char... though I may be misunderstanding the code... – Tor Valamo Jan 09 '10 at 21:22
I was looking for something similar recently, and came across wdiff. It operates on words, not characters, but is this close to what you're looking for?

- 11,308
- 4
- 37
- 33
-
Good, but whitespace should matter too. A tab replaced by a space would be a difference not picked up by this (if split by whitespace). – Tor Valamo Jan 09 '10 at 20:48
-
@lhf, Is it abandoned or there is simply not much to improve anymore? – Aleksandr Levchuk Apr 15 '11 at 00:06
-
2@Aleksandr, I see now that wdiff was revived soon after I posted that comment. See http://ftp.gnu.org/gnu/wdiff/ – lhf Apr 15 '11 at 02:11
-
@lhf, Nice! 16 years of no development and now back in the game. – Aleksandr Levchuk Apr 15 '11 at 04:40
What you could try is to split both strings up character by character into lines and then you can use diff on that. It's a dirty hack, but atleast it should work and is quite easy to implement.
Alternately you can split the string up into a list of chars in Python and use difflib. Check Python difflib reference

- 3,328
- 3
- 27
- 32
-
I thought of this, and it looks like the "best" option so far. I've also considered looking into the line diff tools and try to make it treat chars as lines instead... but I thought I'd check first. – Tor Valamo Jan 09 '10 at 20:24
-
That can be done easily `diff <(cat file1 | tr " " "\n") <(cat fil2 | tr " " "\n")` but the problem is the the output is poorly formatted. Much better to do `wdiff file1 file2` Thanks to @Michael Williamson answer. – Aleksandr Levchuk Apr 15 '11 at 00:02
-
Here is a character-by-character version `diff <(cat a1 | sed 's/./\0\n/'g) <(cat a2 | sed 's/./\0\n/'g)` – Aleksandr Levchuk Apr 15 '11 at 00:10
You can implement a simple Needleman–Wunsch algorithm. The pseudo code is available on Wikipedia: http://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm

- 34,472
- 31
- 113
- 192