95

I have a file like the following and I would like to print the lines between two given patterns PAT1 and PAT2.

1
2
PAT1
3    - first block
4
PAT2
5
6
PAT1
7    - second block
PAT2
8
9
PAT1
10    - third block

I have read How to select lines between two marker patterns which may occur multiple times with awk/sed but I am curious to see all the possible combinations of this, either including or excluding the pattern.

How can I print all lines between two patterns?

fedorqui
  • 275,237
  • 103
  • 548
  • 598
  • I am posting an attempt of canonical answer to [How to select lines between two marker patterns which may occur multiple times with awk/sed](http://stackoverflow.com/a/17988834/1983854) so that all cases are covered. I follow [It's OK to Ask and Answer Your Own Questions](http://blog.stackoverflow.com/2011/07/its-ok-to-ask-and-answer-your-own-questions/) and posted the answer as Community Wiki, so feel free to improve it! – fedorqui Aug 16 '16 at 10:41
  • 2
    @Cyrus yes, thank you! I also checked this one before going ahead and posting this question/answer. The point here is to provide a set of tools on this, since the volume of comments (and votes to them) in [my other answer](http://stackoverflow.com/a/17988834/1983854) lead me think that a generic post would be of good help to future readers. – fedorqui Aug 16 '16 at 10:49
  • See also http://www.thelinuxrain.com/articles/how-to-use-flags-in-awk – user2138595 Aug 16 '16 at 23:18
  • @fedorqui, I didn't hear back so I decided to have a go at improving the question to rank better on Google and clarifying what the scope is. Feel free to revert if you're not happy with it. – Alex Harvey Apr 20 '19 at 12:47
  • @Alex not sure where my comments back were expected, but in any case thanks for the edit! It looks fine to me. Thanks for taking the time on this – fedorqui Apr 20 '19 at 21:57

9 Answers9

143

Print lines between PAT1 and PAT2

$ awk '/PAT1/,/PAT2/' file
PAT1
3    - first block
4
PAT2
PAT1
7    - second block
PAT2
PAT1
10    - third block

Or, using variables:

awk '/PAT1/{flag=1} flag; /PAT2/{flag=0}' file

How does this work?

  • /PAT1/ matches lines having this text, as well as /PAT2/ does.
  • /PAT1/{flag=1} sets the flag when the text PAT1 is found in a line.
  • /PAT2/{flag=0} unsets the flag when the text PAT2 is found in a line.
  • flag is a pattern with the default action, which is to print $0: if flag is equal 1 the line is printed. This way, it will print all those lines occurring from the time PAT1 occurs and up to the next PAT2 is seen. This will also print the lines from the last match of PAT1 up to the end of the file.

Print lines between PAT1 and PAT2 - not including PAT1 and PAT2

$ awk '/PAT1/{flag=1; next} /PAT2/{flag=0} flag' file
3    - first block
4
7    - second block
10    - third block

This uses next to skip the line that contains PAT1 in order to avoid this being printed.

This call to next can be dropped by reshuffling the blocks: awk '/PAT2/{flag=0} flag; /PAT1/{flag=1}' file.

Print lines between PAT1 and PAT2 - including PAT1

$ awk '/PAT1/{flag=1} /PAT2/{flag=0} flag' file
PAT1
3    - first block
4
PAT1
7    - second block
PAT1
10    - third block

By placing flag at the very end, it triggers the action that was set on either PAT1 or PAT2: to print on PAT1, not to print on PAT2.

Print lines between PAT1 and PAT2 - including PAT2

$ awk 'flag; /PAT1/{flag=1} /PAT2/{flag=0}' file
3    - first block
4
PAT2
7    - second block
PAT2
10    - third block

By placing flag at the very beginning, it triggers the action that was set previously and hence print the closing pattern but not the starting one.

Print lines between PAT1 and PAT2 - excluding lines from the last PAT1 to the end of file if no other PAT2 occurs

This is based on a solution by Ed Morton.

awk 'flag{
        if (/PAT2/)
           {printf "%s", buf; flag=0; buf=""}
        else
            buf = buf $0 ORS
     }
     /PAT1/ {flag=1}' file

As a one-liner:

$ awk 'flag{ if (/PAT2/){printf "%s", buf; flag=0; buf=""} else buf = buf $0 ORS}; /PAT1/{flag=1}' file
3    - first block
4
7    - second block

# note the lack of third block, since no other PAT2 happens after it

This keeps all the selected lines in a buffer that gets populated from the moment PAT1 is found. Then, it keeps being filled with the following lines until PAT2 is found. In that point, it prints the stored content and empties the buffer.

Community
  • 1
  • 1
fedorqui
  • 275,237
  • 103
  • 548
  • 598
83

What about the classic sed solution?

Print lines between PAT1 and PAT2 - include PAT1 and PAT2

sed -n '/PAT1/,/PAT2/p' FILE

Print lines between PAT1 and PAT2 - exclude PAT1 and PAT2

GNU sed
sed -n '/PAT1/,/PAT2/{/PAT1/!{/PAT2/!p}}' FILE
Any sed1
sed -n '/PAT1/,/PAT2/{/PAT1/!{/PAT2/!p;};}' FILE

or even (Thanks Sundeep):

GNU sed
sed -n '/PAT1/,/PAT2/{//!p}' FILE
Any sed
sed -n '/PAT1/,/PAT2/{//!p;}' FILE

Print lines between PAT1 and PAT2 - include PAT1 but not PAT2

The following includes just the range start:

GNU sed
sed -n '/PAT1/,/PAT2/{/PAT2/!p}' FILE
Any sed
sed -n '/PAT1/,/PAT2/{/PAT2/!p;}' FILE

Print lines between PAT1 and PAT2 - include PAT2 but not PAT1

The following includes just the range end:

GNU sed
sed -n '/PAT1/,/PAT2/{/PAT1/!p}' FILE
Any sed
sed -n '/PAT1/,/PAT2/{/PAT1/!p;}' FILE

1 Note about BSD/Mac OS X sed

A command like this here:

sed -n '/PAT1/,/PAT2/{/PAT1/!{/PAT2/!p}}' FILE

Would emit an error:

▶ sed -n '/PAT1/,/PAT2/{/PAT1/!{/PAT2/!p}}' FILE
sed: 1: "/PAT1/,/PAT2/{/PAT1/!{/ ...": extra characters at the end of p command

For this reason this answer has been edited to include BSD and GNU versions of the one-liners.

Alex Harvey
  • 14,494
  • 5
  • 61
  • 97
hek2mgl
  • 152,036
  • 28
  • 249
  • 266
  • 2
    Hey, the classic is even shorter! – David C. Rankin Aug 16 '16 at 15:15
  • What about the case of the starting line also matching the end pattern (but perhaps not vice-versa)? That would break your 3rd case at least. – einpoklum Jan 08 '17 at 12:22
  • Then the start and end pattern is not well chosen or the regex need to be more precise. – hek2mgl Jan 08 '17 at 13:12
  • 6
    not sure about other versions, but with GNU sed, the first one can be simplified to `sed -n '/PAT1/,/PAT2/{//!p}' file` ... from [manual](https://www.gnu.org/software/sed/manual/sed.html#Regexp-Addresses) `empty regular expression ‘//’ repeats the last regular expression match` – Sundeep Jun 20 '17 at 09:42
  • 1
    @Sundeep That's for the hint. POSIX says: `If an RE is empty (that is, no pattern is specified) sed shall behave as if the last RE used in the last command applied (either as an address or as part of a substitute command) was specified.` Looks like the only remaining question here is how to interpret `the last RE`. BSD is saying something to this. Look here (Point 23): https://github.com/freebsd/freebsd/blob/master/usr.bin/sed/POSIX – hek2mgl Jun 20 '17 at 13:04
  • 1
    @hek2mgl thanks for additional info... so if I understood correctly, `/PAT1/,/PAT2/{//!p}` will work only if last RE is dynamic.. if it was static, `//` would resolve to `/PAT2/` – Sundeep Jun 20 '17 at 13:12
  • 2
    Looks like. Hard to find an incompatible version to prove that. :) – hek2mgl Jun 20 '17 at 13:16
  • Note there is [a new answer](https://stackoverflow.com/a/55488022/1983854) suggesting improvements to this one. – fedorqui Apr 18 '19 at 05:48
  • @fedorqui, there's my best go at it. – Alex Harvey Apr 18 '19 at 13:54
  • 4
    @AlexHarvey I think it is a great example of kindness what you did here, by sharing your knowledge to improve other answers. Ultimately, this was my goal when I posted this question, so we could have a canonical ([yet another one :P](https://xkcd.com/927/)) set of sources. Many thanks! – fedorqui Apr 18 '19 at 14:00
  • 1
    @AlexHarvey Let me share my view on this: I once answered [How to select lines between two marker patterns which may occur multiple times...](https://stackoverflow.com/q/17988756/1983854) and kept getting quite a lot of comments asking for similar cases. Also, when being active in these tags I felt that I was reusing the same one-liners over and over again. For this I thought that a question-answer covering most of the cases could be useful. +25 stars, +30 votes, ~30K visits, lots of duplicates to this seem to agree with this. Of course it is not comprehensive but it seems to be working well. – fedorqui Apr 18 '19 at 21:42
  • If you compose the sed command from variables, then you must create it from a a combination of parts, some in single quotes, and some in double quotes. Something like this: `sed -n "/$pattern1/,/$pattern2/"'{//!p}'`. The bash shell will not expand the variables if they are in single quotes. But if you contain the whole command in double quotes, bash will interpret `!` as a history command, and will expand it. So that part of the sed command must be in single quotes. – markling Jun 06 '23 at 13:41
13

Using grep with PCRE (where available) to print markers and lines between markers:

$ grep -Pzo "(?s)(PAT1(.*?)(PAT2|\Z))" file
PAT1
3    - first block
4
PAT2
PAT1
7    - second block
PAT2
PAT1
10    - third block
  • -P perl-regexp, PCRE. Not in all grep variants
  • -z Treat the input as a set of lines, each terminated by a zero byte instead of a newline
  • -o print only matching
  • (?s) DotAll, ie. dot finds newlines as well
  • (.*?) nongreedy find
  • \Z Match only at end of string, or before newline at the end

Print lines between markers excluding end marker:

$ grep -Pzo "(?s)(PAT1(.*?)(?=(\nPAT2|\Z)))" file
PAT1
3    - first block
4
PAT1
7    - second block
PAT1
10    - third block
  • (.*?)(?=(\nPAT2|\Z)) nongreedy find with lookahead for \nPAT2 and \Z

Print lines between markers excluding markers:

$ grep -Pzo "(?s)((?<=PAT1\n)(.*?)(?=(\nPAT2|\Z)))" file
3    - first block
4
7    - second block
10    - third block
  • (?<=PAT1\n) positive lookbehind for PAT1\n

Print lines between markers excluding start marker:

$ grep -Pzo "(?s)((?<=PAT1\n)(.*?)(PAT2|\Z))" file
3    - first block
4
PAT2
7    - second block
PAT2
10    - third block
James Brown
  • 36,089
  • 7
  • 43
  • 59
9

For completeness, here is a Perl solution:

Print lines between PAT1 and PAT2 - include PAT1 and PAT2

perl -ne '/PAT1/../PAT2/ and print' FILE

or:

perl -ne 'print if /PAT1/../PAT2/' FILE

Print lines between PAT1 and PAT2 - exclude PAT1 and PAT2

perl -ne '/PAT1/../PAT2/ and !/PAT1/ and !/PAT2/ and print' FILE

or:

perl -ne 'if (/PAT1/../PAT2/) {print unless /PAT1/ or /PAT2/}' FILE 

Print lines between PAT1 and PAT2 - exclude PAT1 only

perl -ne '/PAT1/../PAT2/ and !/PAT1/ and print' FILE

Print lines between PAT1 and PAT2 - exclude PAT2 only

perl -ne '/PAT1/../PAT2/ and !/PAT2/ and print' FILE

See also:

  • Range operator section in perldoc perlop for more on the /PAT1/../PAT2/ grammar:

Range operator

...In scalar context, ".." returns a boolean value. The operator is bistable, like a flip-flop, and emulates the line-range (comma) operator of sed, awk, and various editors.

  • For the -n option, see perldoc perlrun, which makes Perl behave like sed -n.

  • Perl Cookbook, 6.8 for a detailed discussion of extracting a range of lines.

Alex Harvey
  • 14,494
  • 5
  • 61
  • 97
8

Here is another approach

Include both patterns (default)

$ awk '/PAT1/,/PAT2/' file
PAT1
3    - first block
4
PAT2
PAT1
7    - second block
PAT2
PAT1
10    - third block

Mask both patterns

$ awk '/PAT1/,/PAT2/{if(/PAT2|PAT1/) next; print}' file
3    - first block
4
7    - second block
10    - third block

Mask start pattern

$ awk '/PAT1/,/PAT2/{if(/PAT1/) next; print}' file
3    - first block
4
PAT2
7    - second block
PAT2
10    - third block

Mask end pattern

$ awk '/PAT1/,/PAT2/{if(/PAT2/) next; print}' file
PAT1
3    - first block
4
PAT1
7    - second block
PAT1
10    - third block
karakfa
  • 66,216
  • 7
  • 41
  • 56
7

Alternatively:

sed '/START/,/END/!d;//d'

This deletes all lines except for those between and including START and END, then the //d deletes the START and END lines since // causes sed to use the previous patterns.

Daedelus
  • 71
  • 1
  • 2
5

This is like a foot-note to the 2 top answers above (awk & sed). I needed to run it on a large number of files, and hence performance was important. I put the 2 answers to a load-test of 10000 times:

sedTester.sh

for i in `seq 10000`;do sed -n '/PAT1/,/PAT2/{/PAT1/!{/PAT2/!p;};}' patternTester >> sedTesterOutput; done

awkTester.sh

 for i in `seq 10000`;do awk '/PAT1/{flag=1; next} /PAT2/{flag=0} flag' patternTester >> awkTesterOutput; done

Here are the results:

zsh sedTester.sh  11.89s user 39.63s system 81% cpu 1:02.96 total
zsh awkTester.sh  38.73s user 60.64s system 79% cpu 2:04.83 total

sed solutions seems to be twice as fast as the awk solution (Mac OS).

aalosious
  • 578
  • 6
  • 11
4

You can do what you want with sed by suppressing the normal printing of pattern space with -n. For instance to include the patterns in the result you can do:

$ sed -n '/PAT1/,/PAT2/p' filename
PAT1
3    - first block
4
PAT2
PAT1
7    - second block
PAT2
PAT1
10    - third block

To exclude the patterns and just print what is between them:

$ sed -n '/PAT1/,/PAT2/{/PAT1/{n};/PAT2/{d};p}' filename
3    - first block
4
7    - second block
10    - third block

Which breaks down as

  • sed -n '/PAT1/,/PAT2/ - locate the range between PAT1 and PAT2 and suppress printing;

  • /PAT1/{n}; - if it matches PAT1 move to n (next) line;

  • /PAT2/{d}; - if it matches PAT2 delete line;

  • p - print all lines that fell within /PAT1/,/PAT2/ and were not skipped or deleted.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • Thanks for the interesting one-liners and its breakdown! I have to admit I still prefer awk, it looks clearer to me :) – fedorqui Aug 16 '16 at 15:17
  • I got done sorting through this one only to find *hek2mgl* had a shorter way -- take a look at his *classic* `sed` solution. – David C. Rankin Aug 16 '16 at 15:19
3

This might work for you (GNU sed) on the proviso that PAT1 and PAT2 are on separate lines:

sed -n '/PAT1/{:a;N;/PAT2/!ba;p}' file

Turn off implicit printing by using the -n option and act like grep.

N.B. All solutions using the range idiom i.e. /PAT1/,/PAT2/ command suffer from the same edge case, where PAT1 exists but PAT2 does not and therefore will print from PAT1 to the end of the file.

For completeness:

# PAT1 to PAT2 without PAT1
sed -n '/PAT1/{:a;N;/PAT2/!ba;s/^[^\n]*\n//p}' file 

# PAT1 to PAT2 without PAT2
sed -n '/PAT1/{:a;N;/PAT2/!ba;s/\n[^\n]*$//p}' file 

# PAT1 to PAT2 without PAT1 and PAT2   
sed -n '/PAT1/{:a;N;/PAT2/!ba;/\n.*\n/!d;s/^[^\n]*\n\|\n[^\n]*$/gp}' file

N.B. In the last solution PAT1 and PAT2 may be on consecutive lines and therefore a further edge case may arise. IMO both are deleted and nothing printed.

anubhava
  • 761,203
  • 64
  • 569
  • 643
potong
  • 55,640
  • 6
  • 51
  • 83