3

Using sed, AWK (or Perl), how do you print all lines between (the first instance of) two patterns, exclusive of the patterns?1

That is, given as input:

aaa
PATTERN1
bbb
ccc
ddd
PATTERN2
eee

Or possibly even:

aaa
PATTERN1
bbb
ccc
ddd
PATTERN2
eee
fff
PATTERN1
ggg
hhh
iii
PATTERN2
jjj

I would expect, in both cases:

bbb
ccc
ddd

1 A number of users voted to close this question as a duplicate of this one. In the end, I provided a gist that proves they are different. The question is also superficially similar to a number of others, but there is no exact match, and none of them are of high quality, and, as I believe that this specific problem is the one most commonly faced, it deserves a clear formulation, and a set of correct, clear answers.

Alex Harvey
  • 14,494
  • 5
  • 61
  • 97
  • 2
    Meta on this: [Can I create this new question or will it be closed as a dupe or otherwise cause controversy?](https://meta.stackoverflow.com/q/382012/1983854). Quite strange it is not marked as duplicate of [How to select lines between two patterns?](https://stackoverflow.com/q/38972736/1983854). As mentioned in that one, the idea was to compile a set of options, and for this it was marked as CW. You are saying this is not dupe because one answer does not cover one case. Writing yet another canonical seems a waste of time to me and contributes to knowledge dispersion. – fedorqui Apr 17 '19 at 22:16
  • Mmm @hek I left my comment here and then some interesting debate was carried over with Alex, tripleee and me. I would just leave it open by now and see if attracts views. In any case, I see we are talking about this topic asynchronously and in different places (also Meta), so it is difficult to get to a consensous. – fedorqui May 06 '19 at 18:01
  • 1
    @fedorqui I followed the discussion somehow here and there. For me this is a clear duplicate of your question - that was my first thought, without being influenced by the meta post you linked above. I don't see any good reason why the OP shouldn't accept that. – hek2mgl May 06 '19 at 18:09

6 Answers6

6

If you have GNU sed (tested using version 4.7 on Mac OS X), the simplest solution could be:

sed '0,/PATTERN1/d;/PATTERN2/Q'

Explanation:

  • The d command deletes from line 1 to the line matching /PATTERN1/ inclusive.
  • The Q command then exits without printing on the first line matching /PATTERN2/.

If the file has only once instance of the pattern, or if you don't mind extracting all of them, and you want a solution that doesn't depend on a GNU extension, this works:

sed -n '/PATTERN1/,/PATTERN2/{//!p}'

Explanation:

  • Note that the empty regular expression // repeats the last regular expression match.
Alex Harvey
  • 14,494
  • 5
  • 61
  • 97
  • note that this would only print first such sequence of lines between the patterns, if that was your intention, please add that information to the question and the duplicate question marked will no longer hold – Sundeep Mar 18 '19 at 12:29
  • 1
    Sorry @Sundeep, I believe I already said that, but I've now made it even clearer. – Alex Harvey Mar 18 '19 at 12:46
5

With awk (assumes that PATTERN1 and PATTERN2 are always present in pairs and either of them do not occur inside a pair)

$ cat ip.txt
aaa
PATTERN1
bbb
ccc
ddd
PATTERN2
eee
fff
PATTERN1
ggg
hhh
iii
PATTERN2
jjj

$ awk '/PATTERN2/{exit} f; /PATTERN1/{f=1}' ip.txt
bbb
ccc
ddd
  • /PATTERN1/{f=1} set flag if /PATTERN1/ is matched
  • /PATTERN2/{exit} exit if /PATTERN2/ is matched
  • f; print input line if flag is set


Generic solution, where the block required can be specified

$ awk -v b=1 '/PATTERN2/ && c==b{exit} c==b; /PATTERN1/{c++}' ip.txt
bbb
ccc
ddd
$ awk -v b=2 '/PATTERN2/ && c==b{exit} c==b; /PATTERN1/{c++}' ip.txt
2
46
Alex Harvey
  • 14,494
  • 5
  • 61
  • 97
Sundeep
  • 23,246
  • 2
  • 28
  • 103
  • 1
    It was proposed to chat to use `awk '/PATTERN1/{f=1;next}/PATTERN2/{exit}f'` which I notice is essentially the same as `awk '/PATTERN2/{exit} f; /PATTERN1/{f=1}'`, which is why I won't add it as a separate answer. – Alex Harvey Mar 21 '19 at 09:29
4

This might work for you (GNU sed);

sed -n '/PATTERN1/{:a;n;/PATTERN2/q;p;$!ba}' file

This prints only the lines between the first set of delimiters, or if the second delimiter does not exist, to the end of the file.

potong
  • 55,640
  • 6
  • 51
  • 83
2

I attempted twice to answer, but the questions switched hold/duplicate statuses..

Borrowing input from @Sundeep and adding the answer which I shared in the question comments.

Using awk

awk -v x=0 -v y=1 ' /PATTERN1/&&y { x=1;next } /PATTERN2/&&y { x=0;y=0; next } x ' file

with Perl

perl -0777 -ne ' while( /PATTERN1.*?\n(.+?)^[^\n]*?PATTERN2/msg ) { print $1 if $x++ <1 } '

Results:

$ cat ip.txt
aaa
PATTERN1
bbb
ccc
ddd
PATTERN2
eee
PATTERN1
2
46
PATTERN2
xyz

$

$ awk -v x=0 -v y=1 ' /PATTERN1/&&y { x=1;next } /PATTERN2/&&y { x=0;y=0; next } x ' ip.txt
bbb
ccc
ddd

$ perl -0777 -ne ' while( /PATTERN1.*?\n(.+?)^[^\n]*?PATTERN2/msg ) { print $1 if $x++ <1 } ' ip.txt
bbb
ccc
ddd

$

To make it generic

awk here y is the input

awk -v x=0 -v y=2 ' /PATTERN1/ { x++;next } /PATTERN2/ { if(x==y) exit } x==y ' ip.txt
2
46

perl check ++$x against the occurence.. here it is 2

perl -0777 -ne ' while( /PATTERN1.*?\n(.+?)^[^\n]*?PATTERN2/msg ) { print $1 if ++$x==2 } ' ip.txt
2
46
stack0114106
  • 8,534
  • 3
  • 13
  • 38
1

Adding more solutions(possible ways here, for fun :) and not at all claiming that these are better than usual ones) All tested and written in GNU awk. Also tested with given examples only.

1st Solution:

awk -v RS="" -v FS="PATTERN2" -v ORS="" '$1 ~ /\nPATTERN1\n/{sub(/.*PATTERN1\n/,"",$1);print $1}' Input_file

2nd solution:

awk -v RS="" -v ORS="" 'match($0,/PATTERN1[^(PATTERN2)]*/){val=substr($0,RSTART,RLENGTH);gsub(/^PATTERN1\n|^$\n/,"",val);print val}' Input_file

3rd solution:

awk -v RS="" -v OFS="\n" -v ORS="" 'sub(/PATTERN2.*/,"") && sub(/.*PATTERN1/,"PATTERN1"){$1=$1;sub(/^PATTERN1\n/,"")} 1' Input_file

In all above codes output will be as follows.

bbb
ccc
ddd
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
1

Using GNU sed:

sed -nE '/PATTERN1/{:s n;/PATTERN2/q;p;bs}'

-n will prune all but lines between PATTERN1 and PATTERN2 including both, because there will be p printout command. every sed range check if it's true will execute only one the next, so {} grouping is mandated.. Drop PATTERN1 by n command (means next), if reach the first PATTERN2 outrightly quit otherwise print the line then and continue the next line within that boundary.

Alex Harvey
  • 14,494
  • 5
  • 61
  • 97