How to find files with three strings in a bash script?

Question

To find files with two strings I used,

grep -l "$string1" `grep -l "$string2" /path/to/files/*.txt`

Below is the complete description of the sample inputs

script testing.sh

#!/bin/bash
string1="hello"
string2="good"
string3="world"
grep -l "$string1" `grep -l "$string2" /path/to/files/*.txt`

file1.txt

hi
good morning
everyone

file2.txt

hello everyone
good morning world
have a great day

file3.txt

hola
good day today
hello people
sunny morning

Output on running the script:

/path/to/files/file2.txt
/path/to/files/file3.txt

Neither the script you posted nor the answer you accepted find files with 3 strings. It's not even stated in the question what your requirements are for the various important scenarios that are typically encountered with this sort of problem. If you'd like a robust solution to this problem then post a new question. — Ed Morton, Feb 13 '21 at 19:15

tripleee · Accepted Answer · 2019-09-05T06:15:46.157

Your approach can be extended to multiple strings, though you should probably switch from backticks to modern $(...) command substitution syntax.

grep -l "$string1" $(grep -l "$string2" $(grep -l "$string3" /path/to/files/*.txt))

(For the record, the historical backticks could be nested too, though it would get ugly;

grep -l "$string1" `grep -l "$string2" \`grep -l "$string3" /path/to/files/*.txt\``

but I'm not sure whether the quotes inside would survive, and you really should have stopped using this syntax in the previous millenium.)

You could also split the processes like this with xargs:

grep -l "$string1" /path/to/files/*.txt |
xargs grep -l "$string2" |
xargs grep -l "$string3"

Scanning the files three times is pretty inefficient if these are large files, though. You could write a simple Awk script to scan each file only once.

awk 'FNR==1 { s=t=u=0 }
    /string1/ { s=1 }
    /string2/ { t=1 }
    /string3/ { u=1 }
    s && t && u { print FILENAME; nextfile }' /path/to/files/*.txt

If your Awk is really old it might not support nextfile.

The logic should be straightforward; three booleans record for each string whether it has been seen in this file. If they are all true, we are done with this file and print its name to indicate success. If we reach a new file (where the per-file line number FNR will be reset to 1) start over with all booleans set to zero (false).

I'm new to awk, slowly learning. Doesn't the `FNR` reset to 1 upon reaching a new file anyway (https://stackoverflow.com/a/32482115/2923937)? If yes, why explicitly set it to 1? — Perplexabot, Sep 06 '19 at 06:42
@Perplexabot `FNR==1` is not an assignment, it's a condition which fires when this expression is true. Notice the double equals (single `=` is assignment in Awk; double is comparison). — tripleee, Sep 06 '19 at 06:43

oguz ismail · Answer 2 · 2019-09-06T06:42:19.660

1

Using find:

find /path/to/files -type f -name '*.txt' \
     -exec grep -qF "$string1" {} \; \
     -exec grep -qF "$string2" {} \; \
     -exec grep -qF "$string3" {} \; \
     -print

Note that this will list matching files in subfolders too. To prevent it, you should either insert -maxdepth 1 (that is a GNU extension) after /path/to/files, or use this version:

cd /path/to/files
find . ! \( -type d -path '*/*' -prune \) \
     -type f -name '*.txt' \
     -exec grep -qF "$string1" {} \; \
     -exec grep -qF "$string2" {} \; \
     -exec grep -qF "$string3" {} \; \
     -print

You can make this work with an arbitrary number of strings, btw. Like, let's say you have a hundred strings in a file called file. First, you'd need to read them into an array:

mapfile -t strs <file

And then, using this array, you'd generate another array for arguments to find, and use it like:

args=()
for str in "${strs[@]}"; do
  args+=('-exec' 'grep' '-qF' "$str" '{}' ';')
done

find /path/to/files -type f -name '*.txt' "${args[@]}" -print

edited Sep 06 '19 at 06:42

answered Sep 05 '19 at 04:50

oguz ismail

1
16
47
69

1

The "hundred strings" extension looks good, though perhaps already before that point switch to the Awk script I proposed. The addition of the `-F` option to `grep` is a good idea if the strings are static, but of course, we don't know that from the information in the question. – tripleee Sep 06 '19 at 05:52
@triplee Thanks for the comment. Your awk script would need some changes in such case, though – oguz ismail Sep 06 '19 at 05:57
1

Yeah, absolutely. It should not be hard to generalize to read patterns from a file but there's probably already a duplicate question about that anyway. Here's a good start: https://stackoverflow.com/questions/28896544/grep-to-match-all-the-patterns-from-a-file – tripleee Sep 06 '19 at 06:00

score 0 · Answer 3 · answered Sep 05 '19 at 03:51

0

According to your use, you can continue to find three strings like this.

grep -l '$string1' $(grep -l '$string2' `grep -l '$string3' /path/to/files/*.txt`)

answered Sep 05 '19 at 03:51

bykebyn

75
1
2

Presumably `$string1` is a variable which will not be expanded inside single quotes. You might want to switch the backticks to `$(...)` for consistency and readability. – tripleee Sep 05 '19 at 04:02

How to find files with three strings in a bash script?

3 Answers3