grep -f filter.txt data.txt
gets unruly when filter.txt
is larger than a couple of thousands of lines and hence isn't the best choice for such a situation. Even while using grep -f
, we need to keep a few things in mind:
- use
-x
option if there is a need to match the entire line in the second file
- use
-F
if the first file has strings, not patterns
- use
-w
to prevent partial matches while not using the -x
option
This post has a great discussion on this topic (grep -f
on large files):
And this post talks about grep -vf
:
In summary, the best way to handle grep -f
on large files is:
Matching entire line:
awk 'FNR==NR {hash[$0]; next} $0 in hash' filter.txt data.txt > matching.txt
Matching a particular field in the second file (using ',' delimiter and field 2 in this example):
awk -F, 'FNR==NR {hash[$1]; next} $2 in hash' filter.txt data.txt > matching.txt
and for grep -vf
:
Matching entire line:
awk 'FNR==NR {hash[$0]; next} !($0 in hash)' filter.txt data.txt > not_matching.txt
Matching a particular field in the second file (using ',' delimiter and field 2 in this example):
awk -F, 'FNR==NR {hash[$0]; next} !($2 in hash)' filter.txt data.txt > not_matching.txt