I have the files file1
and file2
, where file2
is a subset of file1
. That means, if I iterate over file1
, there are some lines that are in file2
, and some that aren't, but there is no line in file2
that is not in file1
. There may be several lines with the same content in a file. Now I want to get the difference between them, that is, all lines of file1
that aren't in file2
.
According to this well received answer
diff(1) isn't the answer, comm(1) is.
(For whatever reason)
But as I understand, for comm
the files need to be sorted first. The problem: Both files are ordered (not sorted!), and this order needs to be kept. So what I really want is to iterate over file1
, and check for every line, if it is also in file2
. If not, write it to file3
. If the same content occurs more than once, it should be kept more than once!
Is there any way to do this with the command line?