0

I read a lot here about awk and variables, but could not find what I want. I have some files ($FILES) in a directory ($DIR) and I want to search in those files for all lines containing: both the 2 strings (SEARCH1 and SEARCH2). Using sh (/bin/bash): I do NOT want to use the read command, so I prefer awk/grep/sed. The wanted output is the line(s) containing the 2 strings and the corresp. file name(s) of the file(s). When I use this code, everything is ok:

FILES="news_*.txt"
DIR="/news"

awk '/Corona US/&&/Infected/{print a[FILENAME]?$0:FILENAME RS $0;a[FILENAME]++}' ${DIR}/${FILES}

Now I want to replace the 2 patterns ('Corona US' and "Infected') with variables in the awk command and I tried:

SEARCH1="Corona US"
SEARCH2="Infected"

awk -v str1="$SEARCH1" -v str2="$SEARCH2" '/str1/&&/str2/{print a[FILENAME]?$0:FILENAME RS $0;a[FILENAME]++}' ${DIR}/${FILES}

However that did not give me the right output: it came up empty (didn't find anything).

ni_hao
  • 404
  • 2
  • 5
  • 16
  • 1
    See [how-do-i-use-shell-variables-in-an-awk-script](https://stackoverflow.com/questions/19075671/how-do-i-use-shell-variables-in-an-awk-script) and never use the word `pattern` in this context as it's highly ambiguous - simply say `string` or `regexp`, whichever it is you mean. You're using names that start with `str` which implies you want to do a string match but then regexp literal delimiters `/.../` within the script so it's very unclear what type of match you're trying to do. – Ed Morton Apr 19 '20 at 15:24
  • It's also not clear if you want a full match on a "word" (whatever that mans to you) or a partial match on a line or within a word or something else. So please [edit] your question to say if you want a regexp or string match and if you want a full or partial match (and if so between what delimiters or at what position(s) in the line) or something else and some sample input and expected output that covers all your requirements would help a lot too. – Ed Morton Apr 19 '20 at 15:32

2 Answers2

3

Since you have not shown sample of output so couldn't test it, based on OP's code trying to fix it.

awk -v str1="$SEARCH1" -v str2="$SEARCH2" 'index($0,str1) && index($0,str2){print (seen[FILENAME]++ ? "" : FILENAME ORS) $0;a[FILENAME]++}' ${DIR}/${FILES}

OR

awk -v str1="$SEARCH1" -v str2="$SEARCH2" '$0 ~ str1 && $0 ~ str2{print (seen[FILENAME]++ ? "" : FILENAME ORS) $0;a[FILENAME]++}' ${DIR}/${FILES}

OP's code issue: We can't search variables inside /var/ in should be used like index or $0 ~ str style.

RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • 1
    Very good thank you: RavinderSingh13 Both are working although the 1st one ('index($0..') is faster. Thx again – ni_hao Apr 19 '20 at 09:04
1

It isn't 100% clear exactly what you are looking for, but it sounds like grep -H with an alternate pattern would allow you to output the filename and the line that matches $SEARCH1 or $SEARCH2 anywhere in the line. For example, you could do:

grep -H "$SEARCH1.*$SEARCH2\|$SEARCH2.*$SEARCH1" "$DIR/"$FILES

(note $FILES must NOT be quoted in order for * expansion to take place.)

If you just want a list of filenames that contain a match on any line, you can change -H to -l.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • FWIW I didn't downvote but that approach isn't scaleable (try it when searching for 5 strings instead of 2) and it can't work at all when searching for strings (which I think might be what the OP wants since they used variables names that start with `str` but idk), only regexps. – Ed Morton Apr 19 '20 at 15:26
  • 1
    I get your point Ed, thanks. But how is it any less scalable than using `-v str1="$SEARCH1" -v str2="$SEARCH2"` Going from 2-5 terms there wouldn't be any prettier? – David C. Rankin Apr 19 '20 at 19:13
  • 1
    Consider `awk '/a/ && /b/ && /c/ && /d/ && /e/'` vs `grep 'a.*b.*c.*d.*e\|a.*c.*b.*d.*e\|a.*d.*b.*c.*e\|a.*e.*b.*c.*d\|b.*a.*c.*d.*e\|....etc, etc., etc.'`. If you're going to use grep for more than 2 items you need to write `grep a | grep b | grep c | grep d | grep e` instead so you end up with different solution than the one you proposed and that one has a whole bunch of pipes and separate tools calls. You're far better off just using `awk` for any combination of items to match in any order. – Ed Morton Apr 19 '20 at 19:34
  • Yes, that makes sense, At that point, not much you can do but let a parameter expansion permute the orderings and the rest gets ugly fast. But for the question asked with 2-terms, it was a six to one, half-dozen to another issue. – David C. Rankin Apr 19 '20 at 20:06
  • Right, unless the OP really does want to do a string rather than regexp search and then we're back to 1 call to awk or a chain of grep pipes but who knows... – Ed Morton Apr 19 '20 at 20:42