162

I needed to find all the files that contained a specific string pattern. The first solution that comes to mind is using find piped with xargs grep:

find . -iname '*.py' | xargs grep -e 'YOUR_PATTERN'

But if I need to find patterns that spans on more than one line, I'm stuck because vanilla grep can't find multiline patterns.

Reinstate Monica Please
  • 11,123
  • 3
  • 27
  • 48
Oli
  • 15,345
  • 8
  • 30
  • 36
  • 2
    Possible duplicate of [How to find patterns across multiple lines using grep?](https://stackoverflow.com/questions/2686147/how-to-find-patterns-across-multiple-lines-using-grep) – kenorb Apr 15 '18 at 01:53
  • 3
    This one's older, so I'd say it's not a duplicate :) – rogerdpack Dec 03 '18 at 16:16
  • @rogerdpack When marking questions as duplicates, the age of a question is a tertiary concern, after the amount and quality of answers and the quality of the question. – tripleee Jul 13 '19 at 09:01
  • Makes sense, voting to close since it's a "duplicate now" – rogerdpack Jan 20 '22 at 18:14

13 Answers13

120

Why don't you go for awk:

awk '/Start pattern/,/End pattern/' filename
TheDude
  • 3,045
  • 4
  • 46
  • 95
Amit
  • 3,357
  • 2
  • 15
  • 3
  • 2
    This is much easier to understand and uses `awk` that comes with most *nix systems. – Ali Karbassi Jan 28 '11 at 03:12
  • 33
    nice! is there a way to make this match non-greedy? – marcin Jul 04 '12 at 17:16
  • 3
    How would you only print the filename when there is a match? – Bibek Shrestha Sep 03 '12 at 14:07
  • 2
    You can show the line numbers of the matches with `awk '/Start pattern/,/End pattern/ {printf NR " "; print}' filename`. You can make it prettier by giving the line numbers a fixed width: `awk '/Start pattern/,/End pattern/ {printf "%-4s ", NR; print}' filename`. – Robert Jan 06 '15 at 13:12
  • 1
    This seems to work nicely on single file, however, what if I would like to search within multiple files? – Jinstrong Jun 29 '18 at 03:34
  • @marcin, I just tried this with gnu awk 4.2.1 and it appears to be greedy only with regard to the Start pattern, by default, since it just search for the end pattern after finding the start pattern. – Michael Goldshteyn Jul 28 '18 at 20:26
  • @Jinstrong use pipes. for example, `find . -name "*.txt" | xargs -n1 awk '/foo/,/bar/'` will recursively search all txt files in the current directory. – hoefling Sep 09 '18 at 11:33
  • Use grep to find the list of files which contain the basic word/words you're looking for, and then use awk to drill into each file via a for...in loop – Paul Allsopp Sep 26 '18 at 22:47
  • Apparently making this non greedy is "non trivial" https://unix.stackexchange.com/questions/49601/how-to-reduce-the-greediness-of-a-regular-expression-in-awk however the `pcregrep` command can do so. – rogerdpack Dec 03 '18 at 17:19
  • Thanks for this! Helped me filter some log files that needed a multi-line match. – Nuvious Jan 03 '21 at 19:56
  • When I do this, my result contains the start/end pattern as well as what is inbetween these patterns. I am only interested in what is inbetween start and end (e.g., if the full string is `StartpatternStringEndpattern` I would like it to return `String`). Can this be done? The length of `String` varies. – d-b Jul 23 '23 at 05:43
118

Here is the example using GNU grep:

grep -Pzo '_name.*\n.*_description'

-z/--null-data Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.

Which has the effect of treating the whole file as one large line. See -z description on grep's manual and also common question no 14 on grep's manual usage page

laconbass
  • 17,080
  • 8
  • 46
  • 54
ayaz
  • 10,406
  • 6
  • 33
  • 48
  • 2
    That only accounts for a single new-line character, I think. – Cloud Jun 07 '12 at 20:30
  • 1
    I wasn't able to use grep for multiline search, without using flags `-z` so it doesn't split search on single line, and `-o` to print only matched part. – bbaja42 Oct 09 '12 at 08:15
  • I found that -o caused it to not print anything, but -l worked to get a list of files (my command was `grep -rzl pattern *`, -rzo didn't work) – Benubird Mar 26 '13 at 10:29
  • 7
    I recommend ''**grep -Pazo**'' instead of ''-Pzo'' for non-ASCII files. It's better because the -z switch on non-ASCII files **may** trigger grep's "binary data" behaviour which changes the return values. Switch ''-a | --text'' prevents that. – rloth Jan 08 '15 at 13:45
  • Does not work on Mac with git installed by `brew reinstall --with-pcre git` – Quanlong Jun 15 '15 at 00:56
109

So I discovered pcregrep which stands for Perl Compatible Regular Expressions GREP.

the -M option makes it possible to search for patterns that span line boundaries.

For example, you need to find files where the '_name' variable is followed on the next line by the '_description' variable:

find . -iname '*.py' | xargs pcregrep -M '_name.*\n.*_description'

Tip: you need to include the line break character in your pattern. Depending on your platform, it could be '\n', \r', '\r\n', ...

rogerdpack
  • 62,887
  • 36
  • 269
  • 388
Oli
  • 15,345
  • 8
  • 30
  • 36
  • 7
    As mentioned by halka below, "you can also persuade the dot wildcard to match newlines if you add (?s) to your regular expression". Then use grep with perl regex by adding -P. find . -exec grep -nHP '(?s)SELECT.{1,60}FROM.{1,20}table_name' '{}' \; – Jim Feb 22 '13 at 13:02
  • 8
    `pcregrep` is available on the mac with `brew install pcre` – Jared Beck Jul 01 '13 at 20:16
  • 1
    Even better: also use `-H` which prints the filename before each match: `pcregrep -HM`. – Ciro Santilli OurBigBook.com Oct 21 '14 at 19:15
  • `pcregrep: line 1 of file /dev/fd/63 is too long for the internal buffer` when acting on a simple text file like `<(cat file.txt | tr '\0' '\n')`. – Myridium Jan 14 '22 at 04:04
24

grep -P also uses libpcre, but is much more widely installed. To find a complete title section of an html document, even if it spans multiple lines, you can use this:

grep -P '(?s)<title>.*</title>' example.html

Since the PCRE project implements to the perl standard, use the perl documentation for reference:

bukzor
  • 37,539
  • 11
  • 77
  • 111
  • 1
    Hmm tried this just now and didn't seem to work... https://gist.github.com/rdp/0286d91624930bd11d0169d6a6337c33 – rogerdpack Dec 03 '18 at 17:22
  • 1
    I didn't know *grep* had this option. Probably because of this: *This is highly experimental and grep -P may warn of unimplemented features.*; that's under CentOS 7. Under Fedora 29: *This is experimental and grep -P may warn of unimplemented features*. Of course in BSD grep it's not there at all. Would be nice if it wasn't so experimental but it's nice to be reminded of it - little though I'm likely to use it. – Pryftan Sep 23 '19 at 00:10
  • Works with `grep -Pzo` (though adds a trailing NUL char, see some of the other answers). grep -P is common in "linux" but not BSD... – rogerdpack Jan 20 '22 at 21:07
22

Here is a more useful example:

pcregrep -Mi "<title>(.*\n){0,5}</title>" afile.html

It searches the title tag in a html file even if it spans up to 5 lines.

Here is an example of unlimited lines:

pcregrep -Mi "(?s)<title>.*</title>" example.html 
rogerdpack
  • 62,887
  • 36
  • 269
  • 388
Oli
  • 15,345
  • 8
  • 30
  • 36
  • 4
    thanks for this. I was stuck not realizing that a wildcard wouldn't match the newline character. – matt Apr 25 '11 at 15:33
  • 10
    @matt: you can also persuade the dot wildcard to match newlines if you add `(?s)` to your regular expression, like so: `"(?s).*"` – lubomir.brindza Jul 22 '11 at 10:53
  • @matt Of course you can check for *`$`* (at the end of a pattern) to signify it's the end of the line - though that's not the same thing as helping you find multiple line patterns. See also *`glob(7)`*. You might also find this website of interest: https://www.regular-expressions.info – Pryftan Sep 23 '19 at 00:13
11

With silver searcher:

ag 'abc.*(\n|.)*efg'

Speed optimizations of silver searcher could possibly shine here.

Shwaydogg
  • 2,499
  • 27
  • 28
5

@Marcin: awk example non-greedy:

awk '{if ($0 ~ /Start pattern/) {triggered=1;}if (triggered) {print; if ($0 ~ /End pattern/) { exit;}}}' filename
Martin
  • 51
  • 1
  • 1
5

This answer might be useful:

Regex (grep) for multi-line search needed

To find recursively you can use flags -R (recursive) and --include (GLOB pattern). See:

Use grep --exclude/--include syntax to not grep through certain files

Ɖiamond ǤeezeƦ
  • 3,223
  • 3
  • 28
  • 40
albfan
  • 12,542
  • 4
  • 61
  • 80
  • @Ɖiamond ǤeezeƦ note that editing a post in the LQP (https://stackoverflow.com/review/low-quality-posts/19341146) invalidates the review, so just edit if you are sure the post needs to be maintained. – fedorqui Apr 05 '18 at 13:40
4

You can use the grep alternative sift here (disclaimer: I am the author).

It support multiline matching and limiting the search to specific file types out of the box:

sift -m --files '*.py' 'YOUR_PATTERN'

(search all *.py files for the specified multiline regex pattern)

It is available for all major operating systems. Take a look at the samples page to see how it can be used to to extract multiline values from an XML file.

svent
  • 171
  • 1
3
perl -ne 'print if (/begin pattern/../end pattern/)' filename
pbal
  • 31
  • 2
2

Using ex/vi editor and globstar option (syntax similar to awk and sed):

ex +"/string1/,/string3/p" -R -scq! file.txt

where aaa is your starting point, and bbb is your ending text.

To search recursively, try:

ex +"/aaa/,/bbb/p" -scq! **/*.py

Note: To enable ** syntax, run shopt -s globstar (Bash 4 or zsh).

kenorb
  • 155,785
  • 88
  • 678
  • 743
0

As Amit's answer earlier, you can use awk to search for multiple lines. In case you need to print the line number, use the following:

awk '/Start pattern/,/End pattern/ {print NR ":" $0}' filename
Jonathan L
  • 9,552
  • 4
  • 49
  • 38
0

I believe the following should work and has the advantage of only using extended regular expressions without the need to install an extra tool like pcregrep if you don’t have it yet or don’t have the -P option to grep available (eg. macOS):

egrep -irzo “.*aaa(.*\s.*){1,}.*bbb.*" path_to_filenames

Caveat emptor: this does some slight disadvantages:

  • it will find the largest selection of lines from the first aaa to the last bbb in each file, unless...
  • there are several repetitions of the aaa [stuff] bbb pattern in each file.
TransferOrbit
  • 201
  • 2
  • 7