How can I search for a multiline pattern in a file?

Question

I needed to find all the files that contained a specific string pattern. The first solution that comes to mind is using find piped with xargs grep:

find . -iname '*.py' | xargs grep -e 'YOUR_PATTERN'

But if I need to find patterns that spans on more than one line, I'm stuck because vanilla grep can't find multiline patterns.

Possible duplicate of [How to find patterns across multiple lines using grep?](https://stackoverflow.com/questions/2686147/how-to-find-patterns-across-multiple-lines-using-grep) — kenorb, Apr 15 '18 at 01:53
@rogerdpack When marking questions as duplicates, the age of a question is a tertiary concern, after the amount and quality of answers and the quality of the question. — tripleee, Jul 13 '19 at 09:01

score 120 · Answer 1 · edited Nov 04 '14 at 03:45

120

Why don't you go for awk:

awk '/Start pattern/,/End pattern/' filename

edited Nov 04 '14 at 03:45

TheDude

3,045
4
46
95

answered Sep 15 '10 at 13:26

Amit

3,357
2
15
3

2

This is much easier to understand and uses `awk` that comes with most *nix systems. – Ali Karbassi Jan 28 '11 at 03:12
33

nice! is there a way to make this match non-greedy? – marcin Jul 04 '12 at 17:16
3

How would you only print the filename when there is a match? – Bibek Shrestha Sep 03 '12 at 14:07
2

You can show the line numbers of the matches with `awk '/Start pattern/,/End pattern/ {printf NR " "; print}' filename`. You can make it prettier by giving the line numbers a fixed width: `awk '/Start pattern/,/End pattern/ {printf "%-4s ", NR; print}' filename`. – Robert Jan 06 '15 at 13:12
1

This seems to work nicely on single file, however, what if I would like to search within multiple files? – Jinstrong Jun 29 '18 at 03:34
@marcin, I just tried this with gnu awk 4.2.1 and it appears to be greedy only with regard to the Start pattern, by default, since it just search for the end pattern after finding the start pattern. – Michael Goldshteyn Jul 28 '18 at 20:26
@Jinstrong use pipes. for example, `find . -name "*.txt" | xargs -n1 awk '/foo/,/bar/'` will recursively search all txt files in the current directory. – hoefling Sep 09 '18 at 11:33
Use grep to find the list of files which contain the basic word/words you're looking for, and then use awk to drill into each file via a for...in loop – Paul Allsopp Sep 26 '18 at 22:47
Apparently making this non greedy is "non trivial" https://unix.stackexchange.com/questions/49601/how-to-reduce-the-greediness-of-a-regular-expression-in-awk however the `pcregrep` command can do so. – rogerdpack Dec 03 '18 at 17:19
Thanks for this! Helped me filter some log files that needed a multi-line match. – Nuvious Jan 03 '21 at 19:56
When I do this, my result contains the start/end pattern as well as what is inbetween these patterns. I am only interested in what is inbetween start and end (e.g., if the full string is `StartpatternStringEndpattern` I would like it to return `String`). Can this be done? The length of `String` varies. – d-b Jul 23 '23 at 05:43

score 118 · Answer 2 · edited Aug 25 '22 at 10:45

118

Here is the example using GNU grep:

grep -Pzo '_name.*\n.*_description'

-z/--null-data Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.

Which has the effect of treating the whole file as one large line. See -z description on grep's manual and also common question no 14 on grep's manual usage page

edited Aug 25 '22 at 10:45

laconbass

17,080
8
46
54

answered Sep 30 '08 at 12:07

ayaz

10,406
6
33
48

2

That only accounts for a single new-line character, I think. – Cloud Jun 07 '12 at 20:30
1

I wasn't able to use grep for multiline search, without using flags `-z` so it doesn't split search on single line, and `-o` to print only matched part. – bbaja42 Oct 09 '12 at 08:15
I found that -o caused it to not print anything, but -l worked to get a list of files (my command was `grep -rzl pattern *`, -rzo didn't work) – Benubird Mar 26 '13 at 10:29
7

I recommend ''**grep -Pazo**'' instead of ''-Pzo'' for non-ASCII files. It's better because the -z switch on non-ASCII files **may** trigger grep's "binary data" behaviour which changes the return values. Switch ''-a | --text'' prevents that. – rloth Jan 08 '15 at 13:45
Does not work on Mac with git installed by `brew reinstall --with-pcre git` – Quanlong Jun 15 '15 at 00:56

score 109 · Accepted Answer · edited Jan 20 '22 at 17:54

109

So I discovered pcregrep which stands for Perl Compatible Regular Expressions GREP.

the -M option makes it possible to search for patterns that span line boundaries.

For example, you need to find files where the '_name' variable is followed on the next line by the '_description' variable:

find . -iname '*.py' | xargs pcregrep -M '_name.*\n.*_description'

Tip: you need to include the line break character in your pattern. Depending on your platform, it could be '\n', \r', '\r\n', ...

edited Jan 20 '22 at 17:54

rogerdpack

62,887
36
269
388

answered Sep 30 '08 at 11:54

Oli

15,345
8
30
36

7

As mentioned by halka below, "you can also persuade the dot wildcard to match newlines if you add (?s) to your regular expression". Then use grep with perl regex by adding -P. find . -exec grep -nHP '(?s)SELECT.{1,60}FROM.{1,20}table_name' '{}' \; – Jim Feb 22 '13 at 13:02
8

`pcregrep` is available on the mac with `brew install pcre` – Jared Beck Jul 01 '13 at 20:16
1

Even better: also use `-H` which prints the filename before each match: `pcregrep -HM`. – Ciro Santilli OurBigBook.com Oct 21 '14 at 19:15
`pcregrep: line 1 of file /dev/fd/63 is too long for the internal buffer` when acting on a simple text file like `<(cat file.txt | tr '\0' '\n')`. – Myridium Jan 14 '22 at 04:04

score 24 · Answer 4 · answered Jul 26 '12 at 18:47

24

grep -P also uses libpcre, but is much more widely installed. To find a complete title section of an html document, even if it spans multiple lines, you can use this:

grep -P '(?s)<title>.*</title>' example.html

Since the PCRE project implements to the perl standard, use the perl documentation for reference:

answered Jul 26 '12 at 18:47

bukzor

37,539
11
77
111

1

Hmm tried this just now and didn't seem to work... https://gist.github.com/rdp/0286d91624930bd11d0169d6a6337c33 – rogerdpack Dec 03 '18 at 17:22
1

I didn't know *grep* had this option. Probably because of this: *This is highly experimental and grep -P may warn of unimplemented features.*; that's under CentOS 7. Under Fedora 29: *This is experimental and grep -P may warn of unimplemented features*. Of course in BSD grep it's not there at all. Would be nice if it wasn't so experimental but it's nice to be reminded of it - little though I'm likely to use it. – Pryftan Sep 23 '19 at 00:10
Works with `grep -Pzo` (though adds a trailing NUL char, see some of the other answers). grep -P is common in "linux" but not BSD... – rogerdpack Jan 20 '22 at 21:07

score 22 · Answer 5 · edited Dec 03 '18 at 18:09

22

Here is a more useful example:

pcregrep -Mi "<title>(.*\n){0,5}</title>" afile.html

It searches the title tag in a html file even if it spans up to 5 lines.

Here is an example of unlimited lines:

pcregrep -Mi "(?s)<title>.*</title>" example.html

edited Dec 03 '18 at 18:09

rogerdpack

62,887
36
269
388

answered Sep 30 '08 at 12:36

Oli

15,345
8
30
36

4

thanks for this. I was stuck not realizing that a wildcard wouldn't match the newline character. – matt Apr 25 '11 at 15:33
10

@matt: you can also persuade the dot wildcard to match newlines if you add `(?s)` to your regular expression, like so: `"(?s).*"` – lubomir.brindza Jul 22 '11 at 10:53
@matt Of course you can check for *`$`* (at the end of a pattern) to signify it's the end of the line - though that's not the same thing as helping you find multiple line patterns. See also *`glob(7)`*. You might also find this website of interest: https://www.regular-expressions.info – Pryftan Sep 23 '19 at 00:13

score 11 · Answer 6 · answered Jan 13 '15 at 21:05

11

With silver searcher:

ag 'abc.*(\n|.)*efg'

Speed optimizations of silver searcher could possibly shine here.

answered Jan 13 '15 at 21:05

Shwaydogg

2,499
27
28

score 5 · Answer 7 · answered Jul 23 '15 at 13:53

5

@Marcin: awk example non-greedy:

awk '{if ($0 ~ /Start pattern/) {triggered=1;}if (triggered) {print; if ($0 ~ /End pattern/) { exit;}}}' filename

answered Jul 23 '15 at 13:53

Martin

51
1
1

score 5 · Answer 8 · edited Apr 05 '18 at 13:19

5

This answer might be useful:

Regex (grep) for multi-line search needed

To find recursively you can use flags -R (recursive) and --include (GLOB pattern). See:

Use grep --exclude/--include syntax to not grep through certain files

edited Apr 05 '18 at 13:19

Ɖiamond ǤeezeƦ

3,223
3
28
40

answered Aug 24 '11 at 03:19

albfan

12,542
4
61
80

@Ɖiamond ǤeezeƦ note that editing a post in the LQP (https://stackoverflow.com/review/low-quality-posts/19341146) invalidates the review, so just edit if you are sure the post needs to be maintained. – fedorqui Apr 05 '18 at 13:40

score 4 · Answer 9 · answered Feb 22 '15 at 22:50

You can use the grep alternative sift here (disclaimer: I am the author).

It support multiline matching and limiting the search to specific file types out of the box:

sift -m --files '*.py' 'YOUR_PATTERN'

(search all *.py files for the specified multiline regex pattern)

It is available for all major operating systems. Take a look at the samples page to see how it can be used to to extract multiline values from an XML file.

score 3 · Answer 10 · edited Apr 04 '16 at 01:27

3

perl -ne 'print if (/begin pattern/../end pattern/)' filename

edited Apr 04 '16 at 01:27

answered Apr 04 '16 at 00:51

pbal

31
2

This prints the whole file though – Herbert Oct 03 '18 at 22:13
This worked for me, just the block I needed, on OS X. – JonTheNiceGuy Oct 29 '21 at 13:01

kenorb · Answer 11 · 2018-04-15T01:30:35.083

2

Using ex/vi editor and globstar option (syntax similar to awk and sed):

ex +"/string1/,/string3/p" -R -scq! file.txt

where aaa is your starting point, and bbb is your ending text.

To search recursively, try:

ex +"/aaa/,/bbb/p" -scq! **/*.py

^{Note: To enable ** syntax, run shopt -s globstar (Bash 4 or zsh).}

edited Apr 15 '18 at 01:30

answered Oct 16 '15 at 23:11

kenorb

155,785
88
678
743

score 0 · Answer 12 · answered Mar 25 '23 at 04:53

0

As Amit's answer earlier, you can use awk to search for multiple lines. In case you need to print the line number, use the following:

awk '/Start pattern/,/End pattern/ {print NR ":" $0}' filename

answered Mar 25 '23 at 04:53

Jonathan L

9,552
4
49
38

score 0 · Answer 13 · answered Apr 17 '23 at 18:42

I believe the following should work and has the advantage of only using extended regular expressions without the need to install an extra tool like pcregrep if you don’t have it yet or don’t have the -P option to grep available (eg. macOS):

egrep -irzo “.*aaa(.*\s.*){1,}.*bbb.*" path_to_filenames

Caveat emptor: this does some slight disadvantages:

it will find the largest selection of lines from the first aaa to the last bbb in each file, unless...
there are several repetitions of the aaa [stuff] bbb pattern in each file.

How can I search for a multiline pattern in a file?

13 Answers13

Linked

Related