Issues with regex when searching pattern on two lines

Question

I know this type of search has been address in a few other questions here, but for some reason I can not get it to work in my scenario.
I have a text file that contains something similar to the following patter:

some text here done
12345678_123456 226-
more text
some more text here done
12345678_234567 226-

I'm trying to find all cases where done is followed by 226- on the next line, with the 16 characters proceeding. I tried grep -Pzo and pcregrep -M but all return nothing.

I attempted multiple combinations of regex to take in account the 2 lines and the 16 chars in between. This is one of the examples I tried with grep:

grep -Pzo '(?s)done\n.\{16\}226-' filename

Related posts:

Try `grep -Pzo 'done\R.{16}226-' filename` or `grep -Pzo '(?m)done\R.{16}226-$'` — Wiktor Stribiżew, Nov 03 '17 at 20:48

score 1 · Answer 1 · 2017-11-03T21:38:59.700

1

Generalize it to this (?m)done$\s+.*226-$

Because requiring a \n after 226- at end of string is a bad thing.
And not requiring a \n after 226- is also a bad thing.
Thus, the paradox is solved with (\n|$) but why the \n at all?

Both problems solved with multiline and $.

https://regex101.com/r/A33cj5/1

edited Nov 03 '17 at 21:38

answered Nov 03 '17 at 21:11

score 0 · Accepted Answer · answered Nov 03 '17 at 20:49

0

You must not escape { and } while using -P (PCRE) option in grep. That escaping is only for BRE.

You can use:

grep -ozP 'done\R.{16}226-\R' file

done
12345678_123456 226-
done
12345678_234567 226-

\R will match any unicode newline character. If you are only dealing with \n then you may just use:

grep -ozP 'done\n.{16}226-\n' file

answered Nov 03 '17 at 20:49

anubhava

761,203
64
569
643

1

You have no idea how much time I wasted messing around with this and never thought of not escaping the `{` `}`. – slybloty Nov 03 '17 at 20:53
@slybloty Escapes are tricky, and work differently in different Regex languages. For Perl, there's a simple rule: "Any escaped punctuation mark is interpreted as a literal; and any non-escaped alpha-numeric character is interpreted as a literal." `grep`, `egrep`, `vim` and others deviate from this basic rule to varying extents; just memorize the specific exceptions if you need to use those. – jpaugh Nov 03 '17 at 21:24

Issues with regex when searching pattern on two lines

2 Answers2