bash search text between first occurence of pattern1 and last occurence of pattern2

Question

I am trying to get all lines between first occurrence of pattern1 and last occurrence of pattern 2 both the patterns are regex

Example code

TEXT
TEXT
[SUN_START]
[SUN_END]

[MON_START]
TEXT
[MON_END]

[TUE_START]
[TUE_END]

[WED_START]
TEXT
[WED_END]
TEXT
TEXT

Output that I am expecting is

[SUN_START]
[SUN_END]

[MON_START]
TEXT
[MON_END]

[TUE_START]
[TUE_END]

[WED_START]
TEXT
[WED_END]

Pattern is XXX_START and XXX_END

What I am got so far is

cat /u01/app/oracle/admin/LNOPP1P/config/dbbackup_LNOPP1P.config | sed -n -e '/[[A-Z][A-Z][A-Z]_START]/,/[[A-Z][A-Z][A-Z]_END]/p'

But this does not keep the line breaks and displays everything together like this

[SUN_START]
[SUN_END]
[MON_START]
TEXT
[MON_END]
[TUE_START]
[TUE_END]
[WED_START]
TEXT
[WED_END]

I also want to make sure that it only matches the line starts with [[A-Z]_START] and same for END

anishsane · Answer 1 · 2017-11-07T17:27:30.870

1

This awk should work:

awk '/_START\]/{p=1} p{a = a $0 ORS}/_END\]/{printf "%s", a; a="";}' file

Simple logic:

At the first *_START tag, enable p=1. This will discard those lines before the first *_START tag.
For every line, append the current line to a local variable.
At every *_END tag, print the local variable and empty it.
Since we are printing only at the *_END tag, those lines after the last *_END are not printed.

edited Nov 07 '17 at 17:27

answered Nov 07 '17 at 15:10

anishsane

20,270
5
40
73

1

^^ Right. Editing. – anishsane Nov 07 '17 at 17:26

Jose Ricardo Bustos M. · Answer 2 · 2017-11-07T16:00:44.627

1

A solution without awk, using grep

grep -Pzo '(?s)\[([A-Z]{3})_START\].*\n.*\[\1_END\]' file | sed 's/\x00/\n\n/'

you get,

[SUN_START]
[SUN_END]

[MON_START]
TEXT
[MON_END]

[TUE_START]
[TUE_END]

[WED_START]
TEXT
[WED_END]

*based in @albfan answer

edited Nov 07 '17 at 16:00

answered Nov 07 '17 at 15:42

Jose Ricardo Bustos M.

8,016
6
40
62

1

You should mention that uses GNU sed and GNU grep only and that `-P` is "highly experimental" according to the man page.. – Ed Morton Nov 07 '17 at 17:14
I did not know `grep` also supported `-z`. I knew `sed` did, but that does not support non-greedy matching. So I could not make it work. Thanks. +1. – anishsane Nov 07 '17 at 17:30

score 0 · Answer 3 · answered Nov 07 '17 at 14:49

0

You could use awk:

awk '/\[..._START\]/{p=1}/\[..._END\]/{print;p=0}p||!NF' file

The variable p is set when printing is needed. !NF allows to keep blank lines.

answered Nov 07 '17 at 14:49

oliv

12,690
25
45

And this would only bring back results where the line starts with [..._START] ? – Prashant Nov 07 '17 at 14:59
Is it simple to invert this selection? select everything but the bit selected by this AWK ? – Prashant Nov 07 '17 at 16:22
That would print blank lines before/after the target area and would not print non-blank lines between an _END..._START within the target area, – Ed Morton Nov 07 '17 at 17:13

Ed Morton · Answer 4 · 2017-11-07T17:06:12.463

IMHO a two-pass approach without saving the contents in memory is the simplest and most robust:

$ awk '
    NR==FNR { if (/\[[A-Z]+_START\]/ && !beg) beg=NR; if (/\[[A-Z]+_END\]/) end=NR; next }
    FNR>=beg && FNR<=end
' file file
[SUN_START]
[SUN_END]

[MON_START]
TEXT
[MON_END]

[TUE_START]
[TUE_END]

[WED_START]
TEXT
[WED_END]

Consider using [[:upper:]] instead of [A-Z] for portability across locales.

I just saw you had this comment under a different answer:

Is it simple to invert this selection? select everything but the bit selected by this AWK ?

and the answer is "of course", just change the condition at the end of the script:

$ awk '
    NR==FNR { if (/\[[A-Z]+_START\]/ && !beg) beg=NR; if (/\[[A-Z]+_END\]/) end=NR; next }
    FNR<beg || FNR>end
' file file
TEXT
TEXT
TEXT
TEXT

or keep the original condition but makes it's action "next" and add a default "print" for every other line to hit:

$ awk '
    NR==FNR { if (/\[[A-Z]+_START\]/ && !beg) beg=NR; if (/\[[A-Z]+_END\]/) end=NR; next }
    FNR>=beg && FNR<=end { next }
    { print }
' file file
TEXT
TEXT
TEXT
TEXT

bash search text between first occurence of pattern1 and last occurence of pattern2

4 Answers4