0

I'm trying to search all files for a pattern that spans multiple lines, and then return a list of file names that match the pattern.

I'm using this line:

find . -name "$file_to_check" 2>/dir1/null | xargs grep "$2" >> $grep_out

This will create a list of files and the line the matched pattern is found on within $grep_out. The problem with this is that the search doesn't span multiple lines. I've read that grep cannot span multiple lines, so I'm looking to replace grep with sed or awk.

The only thing I think that needs to be changed is the grep. I've found that grep can't search for a pattern across multiple lines, so I'm looking to use sed or awk. When I use these commands from the terminal, I get a large printout of the file matching the pattern I've given sed. All I want is the filename, not the context of the pattern. Is there a way to retrieve this - perhaps have sed print out the filename rather than the context? Or, have sed return true/false when it finds a match, and then I can save the current filename that was used to do the search.

user1472747
  • 529
  • 3
  • 10
  • 25

2 Answers2

4

Most text processing tools are line-oriented by default. If we choose to read records as paragraphs, using blank lines as record separators:

awk -v RS= -v pattern="$2" '$0 ~ pattern {print FILENAME; exit}' file

or

find . -options ... -print0 | xargs -0 awk -v RS= -v pattern="$2" '$0 ~ pattern {print FILENAME; exit}'

I'm assuming your pattern does not contain consecutive newlines (i.e. blank lines)


To check if a file contains "word1[anything]word2[anything]word3"

  1. brute force: read the entire file and then to a regex comparison: with bash

    contents=$(< "$file")
    if [[ $contents =~ "$word1".*"$word2".*"$word3" ]]; then
        echo "match"
    else
        echo "no match"
    fi
    

2. line-by-line with awk, use a state machine

    awk -v w1="$word1" -v w2="$word2" -v w3="$word3" '
        $0 ~ w1            {have_w1 = 1}
        have_w1 && $0 ~ w2 {have_w2 = 1}
        have_w2 && $0 ~ w3 {have_w3 = 1; exit}
        END                {exit (! have_w3)}
    ' filename

Ah, strike #2: that would match the line "word3word2word1" -- does not enforce order of the words

glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • It does not contain consecutive newlines. Though I have another question for you - how do I use awk to search for 3 words which have anything in between them? What is the pattern for: awk "word1[anything]word2[anything]word3" filename where [anything] could be any number of any characters, including newlines. – user1472747 Nov 25 '13 at 19:33
  • How would I incorporate the brute force code? Would I use the awk line you specified above, and put the brute force code in curly brackets? – user1472747 Dec 02 '13 at 13:46
1

I'm trying to search all files for a pattern that spans multiple lines, and then return a list of file names that match the pattern.

pattern=$( echo "whatever your search pattern is" | tr '\n' ' ' )

for FILE in *
do
    tr '\n' ' ' <"$FILE" | if grep "$pattern" then; echo $FILE; fi
done

Just replace the newlines for spaces both in your pattern and your grep-input

With 'find' , you could do it like this:

#!/bin/bash

find . -name "$file_to_check" 2>/dir1/null | while read FILE 
do 
    tr '\n' ' ' <"$FILE" | if grep -q "word1.*word2.*word3" ; then echo "$FILE" ; fi
done >grep_out

As for the search pattern: ".*" means "any amount of any character"

Remember that a searchpattern in grep always wants to have certain characters escaped like "." becomes "\." and "^" becomes "\^"

thom
  • 2,294
  • 12
  • 9
  • How can I use this with my "find" command? More specifically, how can I use this with the pipe? – user1472747 Dec 02 '13 at 14:07
  • Thank you. How would I write the pattern? I'm looking for 3 words with anything, including newlines, in between them. – user1472747 Dec 02 '13 at 16:08
  • Perfect! You wouldn't happen to know how to terminate the grep search if a certain word is found, would you? By that I mean, if "word7" is found anywhere in word1.*word2.*word3, stop the filename from printing. – user1472747 Dec 02 '13 at 17:41
  • I updated by adding the "-q" option, this quits immediately if a match is found. – thom Dec 02 '13 at 17:49
  • I made a new question for what I was trying to ask you: http://stackoverflow.com/questions/20334733/how-to-inverse-match-blacklist-with-regex – user1472747 Dec 02 '13 at 17:58