Difference between two lists using Bash

Question

Ok, I have two related lists on my linux box in text files:

 /tmp/oldList
 /tmp/newList

I need to compare these lists to see what lines got added and what lines got removed. I then need to loop over these lines and perform actions on them based on whether they were added or removed.

How do I do this in bash?

The same question was asked 4 days before http://stackoverflow.com/questions/11099894/comparing-2-unsorted-lists-in-linux-listing-the-unique-in-the-second-file/11101143#11101143 — Nahuel Fouilleul, Jun 23 '12 at 09:04

score 93 · Accepted Answer · answered Jun 23 '12 at 00:08

Use the comm(1) command to compare the two files. They both need to be sorted, which you can do beforehand if they are large, or you can do it inline with bash process substitution.

comm can take a combination of the flags -1, -2 and -3 indicating which file to suppress lines from (unique to file 1, unique to file 2 or common to both).

To get the lines only in the old file:

comm -23 <(sort /tmp/oldList) <(sort /tmp/newList)

To get the lines only in the new file:

comm -13 <(sort /tmp/oldList) <(sort /tmp/newList)

You can feed that into a while read loop to process each line:

while read old ; do
    ...do stuff with $old
done < <(comm -23 <(sort /tmp/oldList) <(sort /tmp/newList))

and similarly for the new lines.

score 9 · Answer 2 · edited Apr 08 '21 at 18:31

9

The diff command will do the comparing for you.

e.g.,

$ diff /tmp/oldList /tmp/newList

See the above man page link for more information. This should take care of your first part of your problem.

edited Apr 08 '21 at 18:31

JL Peyret

10,917
2
54
73

answered Jun 22 '12 at 22:58

Levon

138,105
33
200
191

1

I'll just emphasize that the `diff` command has a ridiculous number of options for formatting the output, which could provide a convenient input to the program that will process the differences. – chepner Jun 22 '12 at 23:16
@chepner good point .. it's definitely worth checking out the linked man page. – Levon Jun 22 '12 at 23:17

score 5 · Answer 3 · answered Nov 07 '13 at 00:10

Consider using Ruby if your scripts need readability.

To get the lines only in the old file:

ruby -e "puts File.readlines('/tmp/oldList') - File.readlines('/tmp/newList')"

To get the lines only in the new file:

ruby -e "puts File.readlines('/tmp/newList') - File.readlines('/tmp/oldList')"

You can feed that into a while read loop to process each line:

while read old ; do
  ...do stuff with $old
done < ruby -e "puts File.readlines('/tmp/oldList') - File.readlines('/tmp/newList')"

score 1 · Answer 4 · answered Feb 03 '15 at 19:57

This is old, but for completeness we should say that if you have a really large set, the fastest solution would be to use diff to generate a script and then source it, like this:

#!/bin/bash

line_added() {
   # code to be run for all lines added
   # $* is the line 
}

line_removed() {
   # code to be run for all lines removed
   # $* is the line 
}

line_same() {
   # code to be run for all lines at are the same
   # $* is the line 
}

cat /tmp/oldList | sort >/tmp/oldList.sorted
cat /tmp/newList | sort >/tmp/newList.sorted

diff >/tmp/diff_script.sh \
    --new-line-format="line_added %L" \
    --old-line-format="line_removed %L" \
    --unchanged-line-format="line_same %L" \
    /tmp/oldList.sorted /tmp/newList.sorted

source /tmp/diff_script.sh

Lines changed will appear as deleted and added. If you don't like this, you can use --changed-group-format. Check the diff manual page.

Nathan · Answer 5 · 2019-01-26T11:42:55.150

1

I typically use:

diff /tmp/oldList /tmp/newList | grep -v "Common subdirectories"

The grep -v option inverts the match:

-v, --invert-match Selected lines are those not matching any of the specified pat- terns.

So in this case it takes the diff results and omits those that are common.

edited Jan 26 '19 at 11:42

answered Jan 26 '19 at 11:35

Nathan

7,627
11
46
80

score -1 · Answer 6 · answered Jun 22 '12 at 22:58

-1

Have you tried diff

$ diff /tmp/oldList /tmp/newList

$ man diff

answered Jun 22 '12 at 22:58

ssedano

8,322
9
60
98

Difference between two lists using Bash

6 Answers6

Linked

Related