I need to find records from data.txt which are not matching data in filter.txt. Earlier I used grep -vf filter.txt data.txt
which was working correctly but was very slow.
As per discussion in grep -vf too slow with large files I switched to
awk 'FNR==NR {hash[$0]; next} !($0 in hash)' filter.txt data.txt > op.txt
which works if filter.txt is not empty.
data.txt
data1
data2
data3
filter.txt
data1
op.txt
data2
data3
but fails if filter.txt is empty. If filter.txt is empty then output op.txt is also empty. Ideally it should be equal to data.txt.
Tried with ARGIND==1. Seems to work for empty filter.txt but producing wrong results for non-empty filter.txt. Expected output is present above.
$ cat filter.txt
abc2
$ awk 'ARGIND==1{hash[$0]; next} !($0 in hash)' filter.txt data.txt > op.txt
$ cat op.txt
abc2
abc1
abc2
abc3
$ vi filter.txt
$ cat filter.txt
$ awk 'ARGIND==1{hash[$0]; next} !($0 in hash)' filter.txt data.txt > op.txt
$ cat op.txt
abc1
abc2
abc3