1

I've used the comm command to compare two files, but I'm unable to pipe it to a third file:

comm file1 file2 > file3 

comm: file 1 is not in sorted order
comm: file 2 is not in sorted order

How do I do this? The files are sorted already.

(comm file1 file2 works and prints it out)

sample input:
file1:

21
24
31
36
40
87
105
134
...

file2:

10
21
31
36
40
40
87
103
...

comm file1 file2: works

comm file1 file2 > file3 

comm: file 1 is not in sorted order
comm: file 2 is not in sorted order
Kevin Reid
  • 37,492
  • 13
  • 80
  • 108
user794479
  • 417
  • 1
  • 6
  • 13

7 Answers7

6

You've sorted numerically; comm works on lexically sorted files.

For instance, in file2, the line 103 is dramatically out of order with the lines 21..87. Your files must be 'plain sort sorted'.

If you've got bash (4.x), you can use process substitution:

comm <(sort file1) <(sort file2)

This runs the two commands and ensures that the comm process gets to read their standard output as if they were files.

Failing that:

(
sort -o file1 file1 &
sort -o file2 file2 &
wait
comm file1 file2
)

This uses parallelism to get the file sorted at the same time. The sub-shell (in ( ... )) ensures that you don't end up waiting for other background processes to finish.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
3

Your sample data is NOT sorted lexicographically (like in a dictionary), which is what commands like comm and sort (without the -n option) expect, where for example 100 should be before 20.

Are you sure that you aren't simply not noticing the error message when you don't redirect the output, since the error would be intermixed with the output lines on the terminal?

Kevin Reid
  • 37,492
  • 13
  • 80
  • 108
1

You have to sort the files first with the sort program.

Joni
  • 108,737
  • 14
  • 143
  • 193
  • 2
    Then use comm's `--nocheck-order` switch, although I'd try to find out why `comm` thinks they are not. – Joni Feb 08 '12 at 19:35
  • Here's why the error occurs with proofs: https://unix.stackexchange.com/a/573503/334294 – F1Linux Mar 18 '20 at 11:35
  • 1
    Note that this answer was written before the question was edited to add the claim that the files were already sorted. – Joni Mar 18 '20 at 13:43
1

Try :

sort -o file1 file1
sort -o file2 file2
comm file1 file2 > file3
Mithrandir
  • 24,869
  • 6
  • 50
  • 66
1

I don't get the same results as you, but perhaps your version of comm is complaining that the files are not sorted lexically. Using the input you provided (the ... makes it interesting, I know it's not a part of your actual files.)

$ comm file[12]
        10
                21
24
                31
                36
                40
        40
                87
        103
        ...
105
134
...

I was surprised that ... wasn't in the third column, so I tried:

$ comm <(sort file1) <(sort file2)
                ...
        10
        103
105
134
                21
24
                31
                36
                40
        40
                87

That's better, but 105 > 24, right?

$ comm <(sort -n file1) <(sort -n file2)
                ...
        10
                21
24
                31
                36
                40
        40
                87
        103
105
134

I think those were the results you are looking for. The two 40s are also interesting. If you want to eliminate these:

$ comm <(sort -nu file1) <(sort -nu file2)
                ...
        10
                21
24
                31
                36
                40
                87
        103
105
134
johnsyweb
  • 136,902
  • 23
  • 188
  • 247
0

I ran into a similar issue, where comm was complaining even though I had run sort. The problem was that I was running Cygwin, and sort pointed to some MSDOS version (I guess). By using the specific path (C:\Cygwin\bin\sort in my case), it worked.

MaxH
  • 859
  • 10
  • 14
0

I had a similar issue when I had sorted files but was getting the same error with

comm -23 16-unique.log 23-unique.log > 16-only.log

but I figured the redirection wasn't working properly so I tried

(comm -23 16-unique.log 23-unique.log ) > 16-only.log

but using sort to ensure the inputs where sorted was the business.

comm -23 <(sort 16-unique.log) <( sort 23-unique.log) > 16-only.log

[As an side the -23 switch means that only the unique rows in the first file will be in the output] also man comm

njames
  • 398
  • 1
  • 5
  • 13