0

I need to find multiple keywords in log file (AND conditions) and followed the recommendations of putting args into array. However, the script throws No such file or directory. To prove my args are in order, I cut and paste the #debug line into cmd and it works.

#!/bin/bash
filter_list=(mod_jk "Dec 04") # array
for i in "${!filter_list[@]}" # with array keys
do
  if [ $i -eq 0 ]; then
    grep_args=(-Ewi "\"${filter_list[$i]}\"" "\"$log_path\"")
  else
    grep_args+=("|") # syntax error near unexpected token `|' if added below instead
    grep_args+=(grep -Ewi "\"${filter_list[$i]}\"") # cannot include pipe | here
  fi
done

grep "${grep_args[@]}" # actual
echo "grep ${grep_args[@]}" # debug

output

grep: "/home/user/log_samples/Apache_2k.log": No such file or directory
grep: |: No such file or directory
grep: grep: No such file or directory
grep: "Dec 04": No such file or directory
grep -Ewi "mod_jk" "/home/user/log_samples/Apache_2k.log" | grep -Ewi "Dec 04"
busterSg
  • 27
  • 5
  • Can you explain your log file and your array better? – Stephen Quan Feb 28 '23 at 06:15
  • 1
    You have to build a **string** as *REGEX* in form: `grep -Ewi "(pattern|pattern)"`, no a *list of arguments*. `|` is not an argument!! – F. Hauri - Give Up GitHub Feb 28 '23 at 06:37
  • Have a look how I [*parallelize* this kind of log file filtering](https://stackoverflow.com/a/75548027/1765658) – F. Hauri - Give Up GitHub Feb 28 '23 at 07:02
  • 1
    @F.Hauri-GiveUpGitHub The OP want an AND filter, not OR. This is why they pipe. – Renaud Pacalet Feb 28 '23 at 07:04
  • @RenaudPacalet If so, *REGEX* will become `pattern1.*pattern2`. Anyway, reading how command are built, I can't be sure about your assertion. – F. Hauri - Give Up GitHub Feb 28 '23 at 07:07
  • @F.Hauri-GiveUpGitHub First sentence of the question: "_I need to find multiple keywords in log file (AND conditions)_" (plus their attempt to build a pipe of multiple grep). And note that if the keyword order can be anything, the `pattern1.*pattern2` regex style quickly becomes very large when the number of keywords increases. That is, not scalable. Moreover, the keywords will not be considered as whole words any more while they apparently want this (`-w` in their attempt). – Renaud Pacalet Feb 28 '23 at 07:12
  • @RenaudPacalet If so, I really think [Parallelize stream processing using bash](https://stackoverflow.com/a/75548027/1765658) Could be a way of.. – F. Hauri - Give Up GitHub Feb 28 '23 at 07:16
  • Or else a *regex* like: `pattern1.*pattern2|pattern2.*pattern1`. or `\bpattern1\b.*\bpattern2\b|\bpattern2\b.*\bpattern1\b`. – F. Hauri - Give Up GitHub Feb 28 '23 at 07:17

4 Answers4

2

As you want to match all regular expressions in filter_list (AND condition) grep is maybe not the best choice. Assuming the expressions you search for do not contain newlines (if they do use a different separator), you could try this GNU awk script:

awk -v w="$(printf '%s\n' "${filter_list[@]}")" '
  BEGIN {IGNORECASE = 1; split(w,res,"\n"); for(i in res) res[i] = "\\<" res[i] "\\>"}
  {for(i in res) if($0 !~ res[i]) next; print}' "$log_path"

Explanation:

printf '%s\n' "${filter_list[@]}" outputs all your regular expressions terminated by a newline character. This is passed to awk as variable w.

The BEGIN block sets IGNORECASE (you apparently want case insensitive match), splits variable w on newline characters, stores the result in awk array res and, for each regular expression REGEX in res modifies it as \\<REGEX\\> (you apparently want to match whole words).

The other block applies to all lines of $log_path. It loops over the res entries, checks if the current line matches it, and skips the line if it doesn't.

Note: this assumes that what you search for is a set of regular expressions (you use the -E grep option and not -F). If your keywords are to be matched as plain text strings you will have to escape all regular expression operators in them. Example if you want to match the literal .*: filter_list=(mod_jk "Dec 04" '\\.\\*').

Renaud Pacalet
  • 25,260
  • 3
  • 34
  • 51
  • Wow! Doing a loop inside `awk` seem heavy – F. Hauri - Give Up GitHub Feb 28 '23 at 08:11
  • Much faster than a long pipe or bash loops. Loops are just a basic `awk` feature. – Renaud Pacalet Feb 28 '23 at 08:12
  • If your file is big, you'd better build a *regex* in form `\bpattern1\b.*\bpattern2\b|\bpattern2\b.*\bpattern1\b` – F. Hauri - Give Up GitHub Feb 28 '23 at 08:15
  • @F.Hauri-GiveUpGitHub The OP did not specify that the regular expressions shall match in order. So the regular expression you suggest would have to match all possible orders and this is not scalable when the number of regular expressions increase (120 different orders for only 5 regular expressions, 3628800 for 10). – Renaud Pacalet Feb 28 '23 at 08:17
  • @F.Hauri-GiveUpGitHub Moreover assuming that matching a complex regular expression is faster than matching several simpler ones is... an assumption that should be proved. I would be very surprised that matching one complex regular expression which is, let's say, a disjunction of 120 complex terms is faster than matching 5 simple terms. – Renaud Pacalet Feb 28 '23 at 08:23
  • @RenaudPacalet Thank you. This works for me, though it seems complex, but I will take it. – busterSg Feb 28 '23 at 08:26
  • @F.Hauri-GiveUpGitHub Sorry forgot to mention regular expressions must match in ANY order. The script is suppose to eat any logs the user throw at it.. all kinds of logs from standard Apache to self-coded apps. – busterSg Feb 28 '23 at 08:28
  • @busterSg To late! I have it! **A lot quicker!!** [`bash` using `sed`](https://stackoverflow.com/a/75589496/1765658) – F. Hauri - Give Up GitHub Feb 28 '23 at 08:29
1

Searching for multiple pattern with AND condition

Using sed will be a lot quicker, than using a loop on each lines:

filter_list=(mod_jk "Dec 04") # array
printf -v sedcmd '/\\b%s\\b/{' "${filter_list[@]}"
printf -v toadd '%*s' ${#filter_list[@]}
sed -ne "$sedcmd"p${toadd// /\}} <file 
F. Hauri - Give Up GitHub
  • 64,122
  • 17
  • 116
  • 137
  • i tested your solution and it works too. I will probably find 1 big log file and take some benchmark. Heard user has 50gb log file to process. Thank you. – busterSg Feb 28 '23 at 08:41
  • @busterSg `sed` is very powerfull with big files! – F. Hauri - Give Up GitHub Feb 28 '23 at 08:45
  • @F.Hauri-GiveUpGitHub You apparently believe that loops are naturally slow. If you try `time awk 'BEGIN {for(i=1;i<10000000;i++) null;}' /dev/null` and `time for((i=1;i<10000000;i++)); do true; done` you'll see that `awk` is not yet another shell like `bash`. Your `sed` script is nice but it's just an unrolled loop. So it could be that it is faster that an equivalent `awk` loop but I would be surprised if it was "_a lot quicker_". And if it is, the reason is probably more in the way these utilities implement regex than in loop / no loop. – Renaud Pacalet Feb 28 '23 at 16:48
  • @RenaudPacalet Please test and compare! My solution is about 2x faster! `sed` is lighter and simplier than `awk` in that I often observed than `sed` is quicker than `grep`, `awk` and lot of others (like `perl -ne`)... Mostly on big files... – F. Hauri - Give Up GitHub Feb 28 '23 at 16:54
  • @RenaudPacalet And I won't try to coimpare bash loop with awk loop! I know `awk` loop is even *quicker* than `bc` loop (wich is quicker then shell loop: https://stackoverflow.com/a/67498861/1765658 ) – F. Hauri - Give Up GitHub Feb 28 '23 at 17:01
  • @F.Hauri-GiveUpGitHub How can I add another time period filter to the sed command? sed -n '/Dec 04 13:55/,/Dec 04 19:55/p' /Apache_2k.log – busterSg Mar 06 '23 at 02:40
  • @busterSg Have a loog there: [filteri log on date range](https://stackoverflow.com/a/41831934/1765658) . – F. Hauri - Give Up GitHub Mar 06 '23 at 06:45
0

use this to retrieve multiple keyword

grep -E "keyword1|keyword2|keyword3" $filename

  • This searches for `keyword1` OR `keyword2` OR `keyword3`. OP wants to search for all these words ("AND" instead of "OR"). – bfontaine Feb 28 '23 at 16:49
0

Try to do the grep in the loop like

filter_list=(mod_jk "Dec 04") # array
    for i in "${!filter_list[@]}" # with array keys
        do
             echo `grep -i $i *.txt`
        done
Julian
  • 334
  • 4
  • 18