find difference between two text files with one item per line

Question

I have two files:

file 1

dsf
sdfsd
dsfsdf

file 2

ljljlj 
lkklk 
dsf
sdfsd
dsfsdf

I want to display what is in file 2 but not in file 1, so file 3 should look like

ljljlj 
lkklk

score 166 · Answer 1 · answered Nov 02 '10 at 15:19

166

grep -Fxvf file1 file2

What the flags mean:

-F, --fixed-strings
              Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched.    
-x, --line-regexp
              Select only those matches that exactly match the whole line.
-v, --invert-match
              Invert the sense of matching, to select non-matching lines.
-f FILE, --file=FILE
              Obtain patterns from FILE, one per line.  The empty file contains zero patterns, and therefore matches nothing.

answered Nov 02 '10 at 15:19

dogbane

266,786
75
396
414

4

Option `-n` could be added to number the differing lines – boczniak767 Nov 27 '14 at 12:59
Any way to highlight the non-matching part of each line? – PeterVermont May 06 '15 at 19:56
1

With this you can find the first difference only and print its line number too: `grep -m 1 -Fnxvf file1 file2` – Paolo M Oct 20 '15 at 15:50
Very inefficient on large files. – Ain Tohvri Dec 07 '15 at 11:14

score 61 · Accepted Answer · edited Jun 01 '15 at 19:14

61

You can try

grep -f file1 file2

or

grep -v -F -x -f file1 file2

edited Jun 01 '15 at 19:14

jopasserat

5,721
4
31
50

answered Nov 02 '10 at 15:05

krico

5,723
2
25
28

4

This won't work. Try adding `dsfblah` to file2. – dogbane Nov 02 '10 at 15:22
6

You can fix it with `grep -F -x` – tripleee May 30 '13 at 14:50
3

I think your suggestion was worth editing the answer @tripleee – jopasserat Jun 01 '15 at 19:15
3

Note that the ordering of the files matters. I'm trying to detect a new addition to a file. I have to write `grep -v -f oldfile newfile` or else it will output nothing. – Marvo Feb 06 '18 at 21:31
2

Imagine me: git add file1. git commit. cat file2 > file1. git diff. – Dec 02 '18 at 02:44
krico@ Can you add explanation for parameter that you pass? – Raghvendra Jan 30 '19 at 06:24

score 47 · Answer 3 · answered Nov 02 '10 at 15:29

47

You can use the comm command to compare two sorted files

comm -13 <(sort file1) <(sort file2)

answered Nov 02 '10 at 15:29

dogbane

266,786
75
396
414

3

FYI, it is actually `comm -1 -3 file1 file2`. The two flags `1` and `3` are merged into one. – cevaris Feb 19 '15 at 20:30
comm -23 <(sort file1) <(sort file2) will output only those in file1 and not in file2 -- the best part of this is any arrangement in file2 works where diff will fail; say file1 has 1,2,3,4,5 and file2 has 1,2,4,5 you get 3 diff will get it wrong – user1213320 Mar 31 '17 at 03:07

Luca Borrione · Answer 4 · 2016-06-16T13:55:38.153

14

I successfully used

diff "${file1}" "${file2}" | grep "<" | sed 's/^<//g' > "${diff_file}"

Outputting the difference to a file.

edited Jun 16 '16 at 13:55

answered Nov 02 '12 at 18:57

Luca Borrione

16,324
8
52
66

What better way to find differences than to use a diff tool haha. Is there higher overhead with using this versus grep? – Allison Jul 31 '17 at 15:26

score 9 · Answer 5 · answered Nov 02 '10 at 15:09

9

if you are expecting them in a certain order, you can just use diff

diff file1 file2 | grep ">"

answered Nov 02 '10 at 15:09

Nate

12,499
5
45
60

score 7 · Answer 6 · answered Nov 02 '10 at 15:48

7

join -v 2 <(sort file1) <(sort file2)

answered Nov 02 '10 at 15:48

Dennis Williamson

346,391
90
374
439

score 4 · Answer 7 · edited May 23 '17 at 12:26

4

A tried a slight variation on Luca's answer and it worked for me.

diff file1 file2 | grep ">" | sed 's/^> //g' > diff_file

Note that the searched pattern in sed is a > followed by a space.

edited May 23 '17 at 12:26

Community

1
1

answered Jan 30 '14 at 11:14

Riccardo Cicuttini

41
1
3

score 3 · Answer 8 · answered Jan 24 '14 at 13:13

file1 
m1
m2
m3

file2 
m2
m4
m5

>awk 'NR == FNR {file1[$0]++; next} !($0 in file1)' file1 file2
m4
m5

>awk 'NR == FNR {file1[$0]++; next} ($0 in file1)' file1 file2
m2

> What's awk command to get 'm1 and m3' ??  as in file1 and not in file2? 
m1
m3

score 1 · Answer 9 · answered Nov 02 '10 at 17:26

1

an awk answer:

awk 'NR == FNR {file1[$0]++; next} !($0 in file1)' file1 file2

answered Nov 02 '10 at 17:26

glenn jackman

238,783
38
220
352

Jahid · Answer 10 · 2016-05-18T12:04:29.177

0

With GNU sed:

sed 's#[^^]#[&]#g;s#\^#\\^#g;s#^#/^#;s#$#$/d#' file1 | sed -f- file2

How it works:

The first sed produces an output like this:

/^[d][s][f]$/d
/^[s][d][f][s][d]$/d
/^[d][s][f][s][d][f]$/d

Then it is used as a sed script by the second sed.

edited May 18 '16 at 12:04

answered May 18 '16 at 07:07

Jahid

21,542
10
90
108

score 0 · Answer 11 · answered Nov 02 '10 at 16:01

If you want to use loops You can try like this: (diff and cmp are much more efficient. )

while read line
do
    flag = 0
    while read line2
    do
       if ( "$line" = "$line2" )
        then
            flag = 1
        fi
     done < file1 
     if ( flag -eq 0 )
     then
         echo $line > file3
     fi
done < file2

Note: The program is only to provide a basic insight into what can be done if u dont want to use system calls such as diff n comm..

find difference between two text files with one item per line

11 Answers11

Linked

Related