21

I want to find the difference between two files and then put only the differences in a third file. I saw different approaches using awk, diff and comm. Are there any more ?

eg.Compare two files line by line and generate the difference in another file

eg.Copy differences between two files in unix

I need to know which is the fastest way of finding all the differences and listing them in a file for each of the cases below -

Case 1 - file2 = file1 + extra text appended.
Case 2 - file2 and file1 are different.
Community
  • 1
  • 1
Steam
  • 9,368
  • 27
  • 83
  • 122

4 Answers4

51

You could try..

comm -13 <(sort file1) <(sort file2) > file3

or

grep -Fxvf file1 file2 > file3

or

diff file1 file2 | grep "<" | sed 's/^<//g'  > file3

or

join -v 2 <(sort file1) <(sort file2) > file3
danmc
  • 1,172
  • 13
  • 11
  • 2
    Using two large text files where one has an extra paragraph of text near the beginning, I timed all four methods. The grep, diff, and join methods all failed to find the extra paragraph. The diff methods needs to grep ">" in addition to "<" to work. I'm not familiar with the grep or join methods. The results: comm: 3.661s, grep: 0.035s, diff: 0.051s, join: 3.811s – Jason Hartley Dec 31 '14 at 16:52
  • Your answer is wrong. Find what is missing in file1 from file2. Right answer: comm -3 <(sort file1) <(sort file2) | tr -d '\t' – binbjz Mar 27 '19 at 15:55
16

Another option:

sort file1 file2 | uniq -u > file3

If you want to see just the duplicate entries use "uniq -d" option:

sort file1 file2 | uniq -d > file3
pron
  • 161
  • 1
  • 4
  • I like this answer the best because it is straightforward, intuitive, and doesn't involve some complex command line options/syntax. – wisbucky Aug 28 '19 at 22:15
  • 1
    Note: one distinction is that for a line that is different, this `uniq` solution will print both `file1` and `file2` versions of the line. The `comm` and `greq` will only print the `file2` version. – wisbucky Aug 28 '19 at 22:29
1

You could also try to include md5-hash-sums or similar do determine whether there are any differences at all. Then, only compare files which have different hashes...

P_M
  • 328
  • 1
  • 7
0

This will work fast:

Case 1 - File2 = File1 + extra text appended.

grep -Fxvf File2.txt File1.txt >> File3.txt

File 1: 80 Lines File 2: 100 Lines File 3: 20 Lines