How can I remove common occurrences between 2 text files using the unix environment?

Question

Ok so I'm still learning the command line stuff like grep and diff and their uses within the scope of my project, but I can't seem to wrap my head around how to approach this problem.

So I have 2 files, each containing hundreds of 20 character long strings. lets call the files A and B. I want to search through A and, using the values in B as keys, locate UNIQUE String entries that occur in A but not in B(there are duplicates so unique is the key here)

Any Ideas?

Also I'm not opposed to finding the answer myself, but I don't have a good enough understanding of the different command line scripts and their functions to really start thinking of how to use them together.

possible duplicate of [Unix command to find lines common in two files](http://stackoverflow.com/questions/373810/unix-command-to-find-lines-common-in-two-files) — Jonathan Leffler, Jan 22 '14 at 20:59
Verdammelt, thanks for the help! I like your answer because it shows me proper usage of grep and also introduces sort which is something I didn't know I could do. — user3225219, Jan 24 '14 at 01:25

score 1 · Answer 1 · edited May 23 '17 at 12:11

1

Look up the comm command (POSIX comm ) to do this. See also Unix command to find lines common in two files.

edited May 23 '17 at 12:11

Community

1
1

answered Jan 22 '14 at 20:58

Jonathan Leffler

730,956
141
904
1,278

verdammelt · Accepted Answer · 2014-01-23T01:53:15.257

1

There are two ways to do this. With comm or with grep, sort, and uniq.

`comm`

comm afile bfile

comm compares the files and outputs 3 columns, lines only in afile, lines only in bfile, and lines in common. The -1, -3 switches tell comm to not print out those columns.

`grep` `sort` `uniq`

grep -F -v -file bfile afile | sort | uniq

or just

grep -F -v -file bfile afile | sort -u

if your sort handles the -u option.

(note: the command fgrep if your system has it, is equivalent to grep -F.)

edited Jan 23 '14 at 01:53

answered Jan 22 '14 at 21:45

verdammelt

922
10
22

Or `grep -F` in lieu of `fgrep`. The official POSIX standard for [`grep`](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/grep.html) no longer includes `egrep` (`grep -E`) or `fgrep` (`grep -F`) -- but real world implementations still include the original names, possibly as alternative names to the main `grep` binary. – Jonathan Leffler Jan 22 '14 at 22:00

How can I remove common occurrences between 2 text files using the unix environment?

2 Answers2

comm

grep sort uniq

`comm`

`grep` `sort` `uniq`