2

I have a certain pattern in my file as so:

....
BEGIN
any text1
any text2
END
....
BEGIN
any text3
garbage text
any text4
END
....
BEGIN
any text5
any text6
END
...

BEGIN and END are my markers, and I want to extract all the text between the markers only if the block does not contain 'garbage text'. So my expectation is to extract the blow blocks:

any text1
any text2

any text5
any text6

How do I do it in awk? I know I can do something like:

awk '/BEGIN/{f=1;next}/END/{f=0;}f' file.log

to extract the lines between the two markers, but how do I further refine the results by further filtering based on absence of 'garbage text'?

Ashwin Prabhu
  • 9,285
  • 5
  • 49
  • 82

1 Answers1

3
$ awk '/END/{if (rec !~ /garbage text/) print rec} {rec=rec $0 ORS} /BEGIN/{rec=""}' file
any text1
any text2

any text5
any text6

The above assumes every END is paired with a preceding BEGIN. WIth GNU awk for multi-char RS you could alternatively do:

$ awk -v RS='END\n' '{sub(/.*BEGIN\n/,"")} RT!="" && !/garbage text/' file
any text1
any text2

any text5
any text6

btw instead of:

awk '/BEGIN/{f=1;next}/END/{f=0;}f' file.log

your original code should be just:

awk '/END/{f=0} f; /BEGIN/{f=1}' file.log

See Printing with sed or awk a line following a matching pattern for related idioms.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185