3

To find a string in a file and print first column of the output, we can use

grep "foo" file.txt | awk '{print $1}' which can be done using awk alone

awk '/foo/ {print $1}' file.txt (https://stackoverflow.com/a/22866418/1662898).

Instead of a single string (foo) as a pattern, I want to search for a list of strings in a file. Using grep, it would be

grep -f file.txt file2.txt | awk '{print $1}' > outFile.txt

Can I do the same using awk alone?

file.txt
abcd
acde
a2rt

file2.txt
1 albcd dhakd kdf
3 abcdbd and
2a bda2rt tert

outFile.txt
3
2a

Thanks! Abhishek

Roland
  • 7,525
  • 13
  • 61
  • 124
Abhishek
  • 279
  • 2
  • 5
  • 18

1 Answers1

5

Equivalent awk command will be this one:

awk 'NR==FNR{a[$1]; next} {for (i in a) if (index($0, i)) print $1}' file.txt file1.txt

Output:

3
2a

Using non-regex string comparison (index($0, i)) instead of a regex match ($0 ~ i) because of -F option of grep.

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Could you explain in more detail what is going on here? – UlfR Sep 26 '18 at 12:07
  • 1
    In the first pass we build an array with 1st column of `file.txt`. In second pass we loop through array and check if array entry is a substring of full line from 2nd file – anubhava Sep 26 '18 at 16:07
  • WARNING: this doesn't work for regexps. https://www.gnu.org/software/gawk/manual/html_node/String-Functions.html index(in, find) Search the string in for the first occurrence of the string find, and return the position in characters where that occurrence begins in the string in. `$ awk 'BEGIN { print index("peanut", "an") }' ==> -| 3` If find is not found, index() returns zero. With BWK awk and gawk, it is a fatal error to use a regexp constant for find. Other implementations allow it, simply treating the regexp constant as an expression meaning ‘$0 ~ /regexp/’. (d.c.) – Douwe van der Leest Oct 30 '20 at 12:03
  • Sorry didn't really understand context of regex here. – anubhava Oct 30 '20 at 14:13