Selecting lines between marker patterns where one pattern may occur twice

Question

If I have a file that contains some text data such as

PATTERN1
TEXT1
PATTERN1
TEXT2
PATTERN2

How would I select the TEXT2 data from this file I know PATTERN1 and PATTERN2 ? I have tried using awk as mentioned here, but it prints both TEXT1 and TEXT2.

score 2 · Answer 1 · answered Mar 18 '20 at 22:16

If TEXT2 is always surrounded by PATTERN1 and PATTERN2 you can use grep:

grep -B2 "PATTERN2" file | grep -A1 "PATTERN1" | grep -v "PATTERN1"

grep -B2 "PATTERN2" -> grab PATTERN2 and the preceding 2 lines
grep -A1 "PATTERN1" -> from these three lines, grab PATTERN1 and the line after
grep -v "PATTERN1" -> get rid of the line/s containing PATTERN1 and you are left with TEXT2

score 2 · Answer 2 · answered Mar 18 '20 at 23:00

2

$ awk '
inBlock {
    if ( /PATTERN2/ ) {
        printf "%s", block
        inBlock = 0
    } else {
        block = block $0 ORS
    }
}
/PATTERN1/ {
    inBlock = 1
    block = ""
}
' file
TEXT2

answered Mar 18 '20 at 23:00

Ed Morton

188,023
17
78
185

potong · Answer 3 · 2020-03-21T11:38:01.470

This might work for you (GNU sed):

sed '/PATTERN1/{z;x;d};/PATTERN2/!{H;d};g;s/.//p;d' file

If the current line contains PATTERN1, clear the line and delete the hold space (HS).

If the current line does not contain PATTERN2, append it to the HS and delete the line.

If the current line contains PATTERN2, replace it by the contents of the HS, remove the first character (which will be an introduced newline), print the result and delete the line.

Alternative:

sed -En '/PATTERN1/{:a;/PATTERN1/z;N;/PATTERN2/!ba;s/.(.*)\n.*/\1/p}' file

The first solution presupposes that the file will contain PATTERN1 and PATTERN2, the second does not.

KamilCuk · Answer 4 · 2020-03-19T15:35:55.227

1

If PATTERN2 can occure multiple times, this extracts only inner text:

sed '/PATTERN1/h;//!H;/PATTERN2/!d;//{x;/PATTERN1/!d}'

If PATTERN2 can occur only once, you can use such sed script:

sed -n '/PATTERN1/h;//!H;/PATTERN2/{x;p}' input_file.txt

or:

sed '/PATTERN1/h;//!H;/PATTERN2/!d;//x'

You can reverse the lines, then use sed with 2 addresses and reverse lines again:

tac input_file.txt | sed -n '/PATTERN2/,/PATTERN1/p' | tac

With sed -z we could remove everything in front and after the patterns, since regex is greedy:

sed -z 's/.*\(PATTERN1\n\)/\1/;s/\(PATTERN2\n\).*/\1/g'

edited Mar 19 '20 at 15:35

answered Mar 19 '20 at 15:22

KamilCuk

120,984
8
59
111

If I run the first solution against the example, the result contains both `PATTERN1` and `PATTERN2`, I was under the impression that the OP wanted these removed. – potong Mar 21 '20 at 11:19

score 0 · Answer 5 · answered Mar 18 '20 at 21:56

Perl to the rescue!

perl -ne 'print(@buffer), $inside = @buffer = () if /PATTERN2/;
          push @buffer, $_ if $inside;
          @buffer = (), $inside = 1 if /PATTERN1/;
' -- file.txt

We keep an array of lines to output in @buffer. We also keep a flag $inside that's set to true if we've met PATTERN1, but not PATTERN2 yet.

If we see PATTERN2, we print the buffer and clear the flag.
If we are inside, we remember the current line.
If we see PATTERN1, regardless of whether we've seen it before or not, we clear the buffer and set the flag.

Selecting lines between marker patterns where one pattern may occur twice

5 Answers5