47

I have 2 files with a list of numbers (telephone numbers).

I'm looking for a method of listing the numbers in the second file that is not present in the first file.

I've tried the various methods with:

comm (getting some weird sorting errors)
fgrep -v -x -f second-file.txt first-file.txt (unsure of the result, there should be more)
pb2q
  • 58,613
  • 19
  • 146
  • 147
mvrasmussen
  • 485
  • 1
  • 5
  • 4
  • Have you checked this answer: http://stackoverflow.com/a/1617326/15165 ? BTW: before doing anything make sure you have got all the trailing lines and extra blank spaces removed. This could be the reason you have not found all of them... – bcelary Jun 19 '12 at 11:28

4 Answers4

88
grep -Fxv -f first-file.txt second-file.txt

Basically looks for all lines in second-file.txt which don't match any line in first-file.txt. Might be slow if the files are large.

Also, once you sort the files (Use sort -n if they are numeric), then comm should also have worked. What error does it give? Try this:

comm -23 second-file-sorted.txt first-file-sorted.txt
Hari Menon
  • 33,649
  • 14
  • 85
  • 108
  • Seems to do the trick, Took only a couple of seconds, with about 500000 lines in the two files combined – mvrasmussen Jun 19 '12 at 11:41
  • 1
    Cool, 500k should be fine on modern machines. But I wouldn't have imagined it would be THAT fast..! Did the comm thing work? – Hari Menon Jun 19 '12 at 11:44
  • 1
    Warning, you can't use sort -n with comm, see my test – Nahuel Fouilleul Jun 21 '12 at 11:09
  • The grep solution works only in the case, that `second-file.txt` is not empty. – Kamil S Jaron Jan 30 '17 at 12:28
  • what does the -23 flag mean? – TEK Aug 11 '20 at 06:20
  • According to the [man page](https://linux.die.net/man/1/comm) -23 will output only 1 column instead of the default of 3. It will output only the the lines unique to second-file-sorted.txt – Charlie Oct 29 '21 at 14:15
  • For those wondering how to do the opposite, just remove the -v (invert-match) – kaios Nov 29 '21 at 16:59
  • I tried the grep vs the comm on 2 large text files that were sorted unique and compared case-insensitive. Each file had only email addresses. The comm command did not find all the matches, but the grep command did. The grep command took considerably longer. The difference in results was not insignificant. – TekOps Jul 25 '22 at 19:11
31

You need to use comm:

comm -13 first.txt second.txt

will do the job.

ps. order of first and second file in command line matters.

also you may need to sort files before:

comm -13 <(sort first.txt) <(sort second.txt)

in case files are numerical add -n option to sort.

rush
  • 2,484
  • 2
  • 19
  • 31
  • That results in: comm: file 2 is not in sorted order comm: file 1 is not in sorted order And a list with exactly the same number of lines as file2 – mvrasmussen Jun 19 '12 at 11:31
  • so you can try to sort them before. i've just added variant with `comm` + `sort`. – rush Jun 19 '12 at 11:44
  • 2
    Keep in mind that sorting the files numerically may not work, as comm expects them to be sorted lexicographically. – chepner Jun 19 '12 at 12:34
14

This should work

comm -13 <(sort file1) <(sort file2)

It seems sort -n (numeric) cannot work with comm, which uses sort (alphanumeric) internally

f1.txt

1
2
21
50

f2.txt

1
3
21
50

21 should appear in third column

#WRONG
$ comm <(sort -n f1.txt) <(sort -n f2.txt)   
                1
2
21
        3
        21
                50

#OK
$ comm <(sort f1.txt) <(sort f2.txt)
                1
2
                21
        3
                50
Nahuel Fouilleul
  • 18,726
  • 2
  • 31
  • 36
1
cat f1.txt f2.txt | sort |uniq > file3
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
tom
  • 27
  • 1
  • 1
    Unfortunately, this provides the unique list of all lines in both files, and the requester is seeking only different lines from file 2. – ingyhere May 01 '15 at 00:50