Multiple line, repeated occurence matching

Question

I refer to below question, but with a bit difference. I need to only get line that has "abc" when there is "efg" matching at different line. And I only need the latest matched "abc" line before "efg" is matched...

How to find patterns across multiple lines using grep?

blah blah..
blah blah..
blah abc blah1
blah blah..
blah blah..
blah abc blah2
blah blah..
blah efg1 blah blah
blah efg2 blah blah
blah blah..
blah blah..

blah abc blah3
blah blah..
blah blah..
blah abc blah4
blah blah..
blah blah blah

blah abc blah5
blah blah..
blah blah..
blah abc blah6
blah blah..
blah efg3 blah blah

blah efg4 blah blah
blah abc blah7
blah blah..
blah blah..
blah abc blah8
blah blah..

Expected output

blah abc blah2
blah abc blah6

score 0 · Answer 1 · answered Feb 25 '16 at 02:09

I can see how to do this in two steps, one to identify the blocks of abc ... efg clusters, but with multiple of the former. Second step is to strip down to the two lines that matter.

Important: make sure there are no pairs of empty lines in the input \n\n, as that will break the perl step.

grep -Pzo '(.*abc.*)\n(.*\n)*?(.*efg.*\n)' text | perl -0777 -pe 's/(.+\n)*(.*abc.*\n)(.+\n)*?(.*efg.*\n)\n/$2$4/g'

For example:

grep -Pzo '(.*abc.*)\n(.*\n)*?(.*efg.*\n)' text
blah abc blah1
blah blah..
blah blah..
blah abc blah2
blah blah..
blah efg1 blah blah

blah abc blah3
blah blah..
blah blah..
blah abc blah4
blah blah..
blah blah blah
blah abc blah5
blah blah..
blah blah..
blah abc blah6
blah blah..
blah efg3 blah blah

See how the efg chunks are separated by two newlines? We then remove the cruft that doesn't matter with a perl search-and-replace regex:

$ grep -Pzo '(.*abc.*)\n(.*\n)*?(.*efg.*\n)' text | perl -0777 -pe 's/(.+\n)*(.*abc.*\n)(.+\n)*?(.*efg.*\n)\n/$2$4/g'
blah abc blah2
blah efg1 blah blah
blah abc blah6
blah efg3 blah blah

If you just want the abc line, just include $2 in the replace block (remove $4).

$ grep -Pzo '(.*abc.*)\n(.*\n)*?(.*efg.*\n)' text | perl -0777 -pe 's/(.+\n)*(.*abc.*\n)(.+\n)*?(.*efg.*\n)\n/$2/g'
blah abc blah2
blah abc blah6

score 0 · Accepted Answer · answered Feb 25 '16 at 02:11

0

This might work for you (GNU sed):

sed -n '/abc/h;/efg/!b;x;/abc/p;z;x' file

Store the latest abc line in the hold space (HS). When encountering a line containing efg, switch to the HS and if that line contains abc print it.

answered Feb 25 '16 at 02:11

potong

55,640
6
51
83

trying to understand the syntax, https://www.gnu.org/software/sed/manual/sed.html but couldnt figure out "/!b", could you please explain a bit? – user3663854 Feb 26 '16 at 04:21
@user3663854 the address can be negated by appending `!` and the `b` command is explained [here](https://www.gnu.org/software/sed/manual/sed.html#Programming-Commands). – potong Feb 26 '16 at 11:20

Multiple line, repeated occurence matching

2 Answers2

Linked