8

The standard diff tool is very useful to find lines in a file that differ, but it doesn't work well for character-by-character differences. I often need to merge texts character-by-character (i.e. written text, not code) modified without synchronization on different computers (yes, I know I shouldn't, but it happens anyway). Apart from adding a paragraph or two, I might have altered a comma, a spelling mistake or some other small change in the text that was previously common to both files.

Diff will tell me what lines are changed, but since there might be multiple diffs per line, I must carefully scan the lines to find each physically small but important diff per line. After fixing, I must repeat the diff to make sure I didn't miss any edits. It gets even worse when the lines are paragraph formatted (i.e. one line per paragraph), and when many consecutive lines have such small differences.

Right now I must admit that I usually just load both files into Microsoft Word and use its built-in diff function. It is of course inconvenient to start a huge package like Word just to find some small differences, but at least it compares files on a character-by-character basis.

What I really want is a Unix way of doing this. A small and cute tool or script that does character-by-character comparisons on text, i.e. not line based, able to ignore line-endings, reporting by some sensible ascii-art, and fully pipeable for use in scripts from the command line?

There is another question for this, Using 'diff' (or anything else) to get character-level diff between text files, but that question was satisfied by a lib exemplified by a web-based tool, I would prefer something on the command-line.

Community
  • 1
  • 1
00prometheus
  • 767
  • 7
  • 20
  • The same question also mentions python [difflib](http://docs.python.org/library/difflib.html) and a [command line](http://docs.python.org/library/difflib.html#a-command-line-interface-to-difflib) interface to it. Did you try that? – devnull Oct 06 '13 at 14:20
  • No, I didn't notice that, I will look into it! I was hoping there was a maintained standard package for this (i.e. Duh! Everyone but you knows that you should use: ...), but it seems to be a trickier problem than I thought. – 00prometheus Oct 07 '13 at 23:00
  • You could put each character into its own line and use a diff-tool on it. – Niklas R Oct 12 '13 at 15:09

2 Answers2

5

I'm not sure if this will meet your "command-line" criteria, but I use gvim / vim daily for this purpose.

  1. Open the files you want to diff like this:

    gvim -d file1 file2
    
  2. Make the window full-screen so it's easier to see

  3. Make the split-windows inside gvim equal size with the command: C-w = (that's Control+W and then =)

  4. To see paragraph formatted lines better, enter :set wrap, then switch to the other split-window with C-w w (or by mouse-click) and there too enter :set wrap

  5. To move between changes, use [c and ]c. To to merge changes, use dp ("diff put") and do ("diff obtain/get").

Lines with differences are highlighted, and the differences within the line are also highlighted with another color. I hope this does what you need. gvim can do even more for you, such as merging from one file to the other. You can find out more with the command :help diff (inside gvim).

You can also try kdiff3, it might be easier than learning vim.

janos
  • 120,954
  • 29
  • 226
  • 236
  • Thanks Janos, I didn't know that vim could do that! It is a lot quicker to start vim than Word, and the vim -d mode does everything that Word does. I still wish for a pure command-line tool, so that I can use it in pipes and so on, but maybe there simply isn't a standard tool for what I want. I am sorry that I have too little points to up-vote your reply, but maybe someone else could? – 00prometheus Oct 07 '13 at 22:51
0

It seems the closest we can get is the vimdiff answer by janos, though it isn't command-line.

A close alternative that is well supported, included in major distributions (like Debian, and even Cygwin), command-line and pipeable, as well as able to ignore line-endings is wdiff. wdiff can be used much in the same way as standard diff. Unfortunately, it isn't character based, it is word based.

For human use, wdiff is probably close enough; finding a single character mismatch within a word is quick and easy. The main disadvantage is that it can not be used in programs and scripts if the purpose is to find single characters.

There doesn't really appear to exist any supported command-line character based diff :-(.

00prometheus
  • 767
  • 7
  • 20