1

I have a list file, which has id and number and am trying to get those lines from a master file which do not have those ids.

List file

nw_66 17296
nw_67 21414
nw_68 21372
nw_69 27387
nw_70 15830
nw_71 32348
nw_72 21925
nw_73 20363

master file

nw_1 5896
nw_2 52814
nw_3 14537
nw_4 87323
nw_5 56466
......
......
nw_n xxxxx

so far am trying this but not working as expected.

for i in $(awk '{print $1}' list.txt); do grep -v -w $i master.txt; done;

Kindly help

3 Answers3

1

Give this awk one-liner a try:

awk 'NR==FNR{a[$1]=1;next}!a[$1]' list master
Kent
  • 189,393
  • 32
  • 233
  • 301
0

Maybe this helps:

awk 'NR == FNR {id[$1]=1;next}
{
    if (id[$1] == "") {
        print $0
    }
}' listfile masterfile

We accept 2 files as input above, first one is listfile, second is masterfile.

NR == FNR would be true while awk is going through listfile. In the associative array id[], all ids in listfile are made a key with value as 1.

When awk goes through masterfile, it only prints a line if $1 i.e. the id is not a key in array ids.

Mihir Luthra
  • 6,059
  • 3
  • 14
  • 39
0

The OP attempted the following line:

for i in $(awk '{print $1}' list.txt); do grep -v -w $i master.txt; done;

This line will not work as for every entry $i, you print all entries in master.txt tat are not equivalent to "$i". As a consequence, you will end up with multiple copies of master.txt, each missing a single line.

Example:

$ for i in 1 2; do grep -v -w "$i" <(seq 1 3); done
2     \ copy of seq 1 3 without entry 1
3     /
1     \ copy of seq 1 3 without entry 2
3     /

Furthermore, the attempt reads the file master.txt multiple times. This is very inefficient.

The unix tool grep allows one the check multiple expressions stored in a file in a single go. This is done using the -f flag. Normally this looks like:

$ grep -f list.txt master.txt

The OP can use this now in the following way:

$ grep -vwf <(awk '{print $1}' list.txt) master.txt

But this would do matches over the full line.

The awk solution presented by Kent is more flexible and allows the OP to define a more tuned match:

awk 'NR==FNR{a[$1]=1;next}!a[$1]' list master

Here the OP clearly states, I want to match column 1 of list with column 1 of master and I don't care about spaces or whatever is in column 2. The grep solution could still match entries in column 2.

Community
  • 1
  • 1
kvantour
  • 25,269
  • 4
  • 47
  • 72