awk to match pattern from a file to another file

Question

To find a string in a file and print first column of the output, we can use

grep "foo" file.txt | awk '{print $1}' which can be done using awk alone

Instead of a single string (foo) as a pattern, I want to search for a list of strings in a file. Using grep, it would be

grep -f file.txt file2.txt | awk '{print $1}' > outFile.txt

Can I do the same using awk alone?

file.txt
abcd
acde
a2rt

file2.txt
1 albcd dhakd kdf
3 abcdbd and
2a bda2rt tert

outFile.txt
3
2a

Thanks! Abhishek

which columns you want to compare from file.txt and file2.txt please be clear in information, so that we could try to help you. — RavinderSingh13, Sep 13 '17 at 13:42
file.txt contains one string (pattern) per line and anywhere in file2.txt (no specific column. — Abhishek, Sep 13 '17 at 13:52

score 5 · Accepted Answer · answered Sep 13 '17 at 13:46

5

Equivalent awk command will be this one:

awk 'NR==FNR{a[$1]; next} {for (i in a) if (index($0, i)) print $1}' file.txt file1.txt

Output:

3
2a

Using non-regex string comparison (index($0, i)) instead of a regex match ($0 ~ i) because of -F option of grep.

answered Sep 13 '17 at 13:46

anubhava

Could you explain in more detail what is going on here? – UlfR Sep 26 '18 at 12:07
1

In the first pass we build an array with 1st column of `file.txt`. In second pass we loop through array and check if array entry is a substring of full line from 2nd file – anubhava Sep 26 '18 at 16:07
WARNING: this doesn't work for regexps. https://www.gnu.org/software/gawk/manual/html_node/String-Functions.html index(in, find) Search the string in for the first occurrence of the string find, and return the position in characters where that occurrence begins in the string in. `$ awk 'BEGIN { print index("peanut", "an") }' ==> -| 3` If find is not found, index() returns zero. With BWK awk and gawk, it is a fatal error to use a regexp constant for find. Other implementations allow it, simply treating the regexp constant as an expression meaning ‘$0 ~ /regexp/’. (d.c.) – Douwe van der Leest Oct 30 '20 at 12:03
Sorry didn't really understand context of regex here. – anubhava Oct 30 '20 at 14:13

1 Answers1