16

I am trying to use comm to compute the difference between two sorted files, however the result doesn't make sense, what's wrong? I want to show the strings that exists in test2 but not test1, and then show the strings that exist in test1 but not test2

>test1
a
b
d
g

>test2
e
g 
k
p

>comm test1 test2
a
b
d
    e
g
    g 
    k
    p
user121196
  • 30,032
  • 57
  • 148
  • 198

2 Answers2

25

To show the lines that exist in test2 but not in test1, write either of these:

comm -13 test1 test2
comm -23 test2 test1

(-1 hides the column with lines that exist only in the first file; -2 hides the column with lines that exist only in the second file; -3 hides the column with lines that exist in both files.)

And, vice versa to show the lines that exist in test1 but not in test2.

Note that g on a line by itself is considered distinct from g with a space after it, which is why you get

g
    g 

instead of

        g
ruakh
  • 175,680
  • 26
  • 273
  • 307
2

Add a character in common between the 2 files, say 'z' at the end. You'll see that a 3rd columns appears, to indicate that that value is common to both.

The output is meant to show 'data in col1 is uniq to file1', while 'data in col2 is unique to file2'.

Finally, arguments to comm '-1, -2, -3' mean suppress output from column numbered supplied, for example, -1.

I hope this helps.

shellter
  • 36,525
  • 7
  • 83
  • 90
  • @shelter: g is the common character, anyway there had to be some hidden characters in the files that messed up comm, I rewrite the file with the same letters and the result is correct now. – user121196 Dec 22 '11 at 01:06
  • @user121196 : yes, sorry I missed that detail, I didn't have my glasses on as I read your post. Glad you got a solution. Good luck. – shellter Dec 22 '11 at 01:21