Print all lines between two patterns (which may not be pairs) with AWK/SED/GREP

Question

I know there have been multiple instances of this question such as Print all lines between two patterns, exclusive, first instance only (in sed, AWK or Perl) but my question is for if the two patterns may not be paired - for instance

given input

PATTERN1
bbb
ccc
ddd
eee
fff
PATTERN1
ggg
hhh
iii
PATTERN2
jjj
PATTERN2
kkk

I would expect the shortest range as output:

ggg
hhh
iii

Is this possible?

RavinderSingh13 · Accepted Answer · 2020-10-26T06:03:26.020

Could you please try following, written and tested based on your shown samples only in GNU awk.

awk '
/PATTERN1/ && found1 && !found2{
  found1=found2=val=""
}
/PATTERN1/{
  found1=1
  next
}
/PATTERN2/{
  found2=1
  if(found1){
    print val
  }
  found1=found2=val=""
  next
}
{
  val=(val?val ORS:"")$0
}
' Input_file

Output for given samples will be:

ggg
hhh
iii

Explanation: Adding detailed explanation for above.

awk '                              ##Starting awk program from here.
/PATTERN1/ && found1 && !found2{   ##Checking if PATTERN1 in current line and found1 is SET and found2 is NOT SET then do following.
  found1=found2=val=""             ##Nullifying found1, found2 and val variables here.
}
/PATTERN1/{                        ##Checking condition if PATTERN1 is found then do following.
  found1=1                         ##Setting found1 here for flagging.
  next                             ##next will skip all further statements from here.
}
/PATTERN2/{                        ##Checking condition if PATTERN2 is found then do following.
  found2=1                         ##Setting found2 here for flagging.
  if(found1){                      ##Checking condition if found1 is SET then do following.
    print val                      ##Printing val here.
  }
  found1=found2=val=""             ##Nullifying found1, found2 and val here.
  next                             ##next will skip all further statements from here.
}
{
  val=(val?val ORS:"")$0           ##Creating val which has current line value and keep appending it with new line.
}
' Input_file                       ##Mentioning Input_file name here.

It took me a minute to snap to you using `PATTERN1[2]` and that's how you did it without an array `:)` — David C. Rankin, Oct 26 '20 at 05:55
@DavidC.Rankin, yeah working for shown samples and I hope this should help OP cheers :) — RavinderSingh13, Oct 26 '20 at 05:56
Yep, that works like a champ. I just wasn't sure how attached the OP was to the exact patterns listed. No reason not to be. — David C. Rankin, Oct 26 '20 at 05:58
thanks! @DavidC.Rankin it's working great! sorry for the late reply :p i kind of found a hack too: awk 'f{{x=x"\n"$0}} {from_pattern}/{{x=$0; f=1}} f && /{to_pattern}/ {{print x; f=0}}' {path_to_file} — pommelo, Nov 06 '20 at 15:31

score 2 · Answer 2 · answered Oct 26 '20 at 05:52

In awk you can do it by saving the PATTERN.. and comparing each time a PATTERN.. is encountered. Between the two, you save the elements in an array, and when you have two patterns that don't match -- you output the contents of the array. Otherwise you empty the array and reset your counter, e.g.

awk '! /PATTERN/ {
        a[++n]=$0
    }
    /PATTERN/ {
        if ($0 != lastptrn)
            for (i=1; i<=n; i++)
                print a[i]
        delete a
        n=0
        lastptrn=$0
    }
' file

Output

ggg
hhh
iii

score 2 · Answer 3 · answered Oct 26 '20 at 06:48

2

If Perl happens to be your option, would you please try:

perl -0777 -ne '/.*PATTERN1\n(.*?)PATTERN2/s && print $1' input

Result:

ggg
hhh
iii

-0777 option tells Perl to slurp all lines at once.
s option to the regex tells Perl to include newline character in metacharacter ..
.*PATTERN1\n winds the position until the end of last PATTERN1.
(.*?) specifies the shortest match and assign $1 to the matched lines.

answered Oct 26 '20 at 06:48

tshiono

21,248
2
14
22

Perl is the right tool for such problems.. thumps up!.. – stack0114106 Oct 26 '20 at 12:06

score 0 · Answer 4 · answered Oct 26 '20 at 07:33

Another:

$ awk '
/PATTERN1/ {                # at starting pattern
    f=1                     # flag up
    b=""                    # reset buffer
    next                    # to exclude the start pattern
}
/PATTERN2/ {                # at ending pattern
    print b                 # output buffer
    exit                    # no need to continue to the end
}
f {                         # when flag up
    b=b (b==""?"":ORS) $0   # buffer records
}' file

To include the starting and ending markers, remove the next and move f {...} before the /PATTERN2/ {...}

potong · Answer 5 · 2020-10-26T12:06:04.937

This might work for you (GNU sed):

sed -n '/PATTERN2/{g;/PATTERN1/{s/[^\n]*\n//p;q}};H;/PATTERN1/h' file

Overview: Copy lines from PATTERN1 up to but not including PATTERN2 into the hold space and then print the hold space minus the first line.

Processing: Append all lines to the hold space, replacing the hold space by the contents of PATTERN1 when it matches.

When PATTERN2 matches, overwrite the pattern space by the hold space and if the pattern space contains PATTERN1, remove the first line, print the contents of the pattern space and quit.

Print all lines between two patterns (which may not be pairs) with AWK/SED/GREP

5 Answers5