Extract lines between two patterns by performing exact match for the 1st pattern only

Question

I'm trying to extract from a large file lines located between two lines each of which is marked by a certain pattern, let's say pattern1 and pattern2. My code :

awk "/pattern1/{flag=1;next}/pattern2/{flag=0}flag" filename

verifies if "pattern1" exists in a line and start printing from that line until it finds a subsequent line in which the string "pattern2" exists.

What I would like to do is exactly matching the string "pattern1" with the line from which awk will begin printing, and detecting the line at which awk will stop printing by verifying if "pattern2" exists in the line (no exact matching). So basically, I would like to do exact matching for the first pattern and keep the matching behavior of the command above for the second pattern.

riteshtch · Answer 1 · 2016-05-21T17:03:34.043

0

awk has that functionality builtin like this:

$ cat data 
abcd
pattern1
xyz
pattern2
abcde
$ awk '/pattern1/,/pattern2/' data
pattern1
xyz
pattern2

And sed has it too:

$ sed -n '/pattern1/,/pattern2/p' data
pattern1
xyz
pattern2

Edit: for that you will have to use some sort of anchors, either word boundary \y in gawk or start and end anchors like this:

$ cat data 
abcd
pattern1 234
pattern1
xyz
pattern2
abcde
$ awk '/^pattern1$/,/pattern2/' data 
pattern1
xyz
pattern2

And if you want combinations of printing or not printing the pattern1/pattern2 lines you can use these:

$ awk '/^pattern1$/{flag=1} /pattern2/{flag=0}flag' data 
pattern1
xyz
$ awk '/^pattern1$/{flag=1;next} /pattern2/{flag=0}flag' data 
xyz
$ awk '/^pattern1$/{flag=1;next;} /pattern2/{flag=0;print}flag' data 
xyz
pattern2

edited May 21 '16 at 17:03

answered May 21 '16 at 16:21

riteshtch

8,629
4
25
38

If we add "pattern1 234" between the first line and the second line of your data file, then `awk '/pattern1/,/pattern2/' data` will print "pattern1 234", "pattern1", "xyz", and "pattern2". As I said in my post, I would like to do EXACT matching for the first pattern and DEFAULT/CLASSIC matching for the second. The command I put in my post does about the same as yours except that it does not print the lines containing the patterns (which your code does). – dada May 21 '16 at 16:30
@dada well for that you will have to show us your input data and regex patterns so that we can do those exact matches. If you want whole word matching, you can use `gawk` with word boundary `\y` or put anchors like this: `awk '/^pattern1$/,/pattern2/ data` which will print what you want – riteshtch May 21 '16 at 16:44

Jérôme Kunegis · Answer 2 · 2016-05-21T17:29:34.080

Here's another answer in line with the suggestion in the question:

awk 'BEGIN{flag=0} /^pattern1$/{flag=1;print;next} /pattern2/{flag=0;next} {if (flag == 1) {print}}'

The first pattern must match the full line exactly (using ^ and $), while the second pattern can appear anywhere within the line.

EDIT: This version does print the lines on which pattern1 appears. If you want to not print them, replace "flag=1;print;next" by "flag=1;next".

I-V · Answer 3 · 2016-05-22T18:53:06.453

0

awk 'BEGIN{flag=0} /^pattern1$/{flag=1;print;next} /pattern2/{if (flag == 1) {print}; flag=0;} {if (flag == 1) {print}}' filename

This way you can avoid printing double "pattern2":

me:~$ awk 'BEGIN{flag=0} /^pattern1$/{flag=1;print;next} /pattern2/{if (flag == 1) {print}; flag=0;} {if (flag == 1) {print}}' a
pattern1
xyz
as pattern2 sd

me:~$ cat a
abcd
pattern1 23
pattern1
xyz
as pattern2 sd
abcde
pattern2

edited May 22 '16 at 18:53

answered May 21 '16 at 17:25

I-V

725
4
10

I have posted my answer hours before you did... how can you tell me my answer doesn't consider yours when you answer didn't exist when I posted it? – I-V May 22 '16 at 18:23
1

I have already changed it... in that you certainly right! It way an arrogant statement. – I-V May 22 '16 at 18:51

score 0 · Answer 4 · edited May 23 '17 at 12:23

Without sample input/output it's a guess but this MAY be what you want:

awk '/pattern2/{flag=0} flag; $0=="pattern1"{flag=1}' filename

which could be written more meaningfully as:

awk '/end_regexp/{found=0} found; $0=="start_string"{found=1}' filename

(Nbd but naming a flag flag is as useful as naming a function function!)

I actually think this might be what you REALLY should be using but idk:

awk 'index($0,"end_string"){found=0} found; $0=="start_string"{found=1}' filename

See also https://stackoverflow.com/a/18409469/1745001 for more ways to find text using awk.

Extract lines between two patterns by performing exact match for the 1st pattern only

4 Answers4