0

Working in linux/shell env, how can I accomplish the following:

text file 1 contains:

1
2
3
4
5

text file 2 contains:

6
7
1
2
3
4

I need to extract the entries in file 2 which are not in file 1. So '6' and '7' in this example and now where it found them. For example, 6, 7 in file 1

I already work with this awk command

awk 'FNR==NR{a[$0]++;next}!a[$0]' file1 file2

But this command can only show the difference, So, 6 and 7 but not where it fouind it.

How can I do this from the command line?

many thanks!

codeforester
  • 39,467
  • 16
  • 112
  • 140
craken
  • 1,411
  • 11
  • 16
  • 1
    What do you mean by **where it found it**? Do you want to print filename with missing entries? – anubhava Dec 15 '15 at 16:32
  • Yes i want to print the filname that contain difference. – craken Dec 16 '15 at 08:49
  • So even when file1 has some extra lines you want to print them as well? – anubhava Dec 16 '15 at 09:49
  • See also: [Fastest way to find lines of a file from another larger file in Bash](https://stackoverflow.com/questions/42239179/fastest-way-to-find-lines-of-a-file-from-another-larger-file-in-bash). – codeforester Mar 05 '18 at 18:09

2 Answers2

1

Using awk you can do this:

awk 'FNR==NR { seen[$0]=FILENAME; next }
  {if ($1 in seen) delete seen[$1]; else print $1, FILENAME}
  END { for (i in seen) print i, seen[i] }' file{1,2}
6 file2
7 file2
5 file1

While traversing file1 we are storing column1 of each row in an array seen with value as FILENAME. Next while iterating file2 we print each missing entry and delete if entry is found (common entries). Finally in END block we print all remaining entries from array seen.

anubhava
  • 761,203
  • 64
  • 569
  • 643
0

The comm program will tell you what lines files have in common (or are unique to one file). comm works best when the files are sorted lexically.

$ echo "only in file1"; comm -2 -3 <(sort file1) <(sort file2)
only in file1
5

$ echo "only in file2"; comm -1 -3 <(sort file1) <(sort file2)
only in file2
6
7

$ echo "common to file1 and file2"; comm -1 -2 <(sort file1) <(sort file2)
common to file1 and file2
1
2
3
4
glenn jackman
  • 238,783
  • 38
  • 220
  • 352