16

on Linux box I have one file as below A.txt

1
2
3
4

Second file as below B.txt

1
2
3
6

I want to know what is inside A.txt but not in B.txt i.e. it should print value 4

I want to do that on Linux.

Andrei
  • 42,814
  • 35
  • 154
  • 218
user3190479
  • 255
  • 1
  • 3
  • 9
  • 2
    Not sure why this was considered "unclear" - `comm` is the command you're looking for. In this specific case `comm -23 A.txt B.txt`. – twalberg Jan 17 '14 at 16:52
  • It seems that "unclear" is being used for lack of a "no effort" closer. – showdev Jan 17 '14 at 17:45
  • This is a great question - it described my exact problem in a way that I was able to find on google easily, and it has answers I can use. – rjmunro Sep 03 '15 at 09:51

5 Answers5

21
awk 'NR==FNR{a[$0]=1;next}!a[$0]' B A

didn't test, give it a try

Kent
  • 189,393
  • 32
  • 233
  • 301
  • [Explanation of why it works](https://stackoverflow.com/a/32488079/213816) – nonsleepr Oct 17 '17 at 14:54
  • That would populate `a[]` with the superset of values from both files and so use more memory than necessary. It's better to do `awk 'NR==FNR{a[$0];next} !($0 in a)' B A` instead so it only has to hold the contents of `B` in memory. – Ed Morton Aug 26 '21 at 11:54
21

Use comm if the files are sorted as your sample input shows:

$ comm -23 A.txt B.txt
4

If the files are unsorted, see @Kent's awk solution.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
8

You can also do this using grep by combining the -v (show non-matching lines), -x (match whole lines) and -f (read patterns from file) options:

$ grep -v -x -f B.txt A.txt
4

This does not depend on the order of the files - it will remove any lines from A that match a line in B.

rjmunro
  • 27,203
  • 20
  • 110
  • 132
4

(An addition to @rjmunro's answer)

The proper way to use grep for this is:

$ grep -F -v -x -f B.txt A.txt
4

Without the -F flag, grep interprets PATTERN, read from B.txt, as a basic regular expression (BRE), which is undesired here, and can cause troubles. -F flag makes grep treat PATTERN as a set of newline-separated strings. For instance:

$ cat A.txt
&
^
[
]

$ cat B.txt
[
^
]
|

$ grep -v -x -f B.txt A.txt
grep: B.txt:1: Invalid regular expression

$ grep -F -v -x -f B.txt A.txt
&
3

Using diff:

diff --changed-group-format='%<' --unchanged-group-format='' A.txt B.txt
gipsh
  • 578
  • 1
  • 3
  • 20
  • 2
    I am not sure if `diff` will work on this problem. think about in A, I have from 1-10, but in B I have 9-1 and 100-200. all are unsorted. the output should be 10. – Kent Jan 17 '14 at 15:16
  • 1
    On the question the files are sorted. It wont work on unsorted files. You can use sort or any other shell command to sort the file. – gipsh Jan 17 '14 at 15:44