Looks for patterns across different lines

Question

I have a file like this (test.txt):

abc
12
34
def
56
abc
ghi
78
def
90

And I would like to search the 78 which is enclosed by "abc\nghi" and "def". Currently, I know I can do this by:

cat test.txt | awk '/abc/,/def/' | awk '/ghi/,'/def/'

Is there any better way?

though you wanted only *to **search** the 78* , what should be the final output? — RomanPerekhrest, Nov 14 '17 at 10:49
hmm..good point.. I thought the command OP tried was giving expected output.. but perhaps only lines between are needed, so I've edited my answer — Sundeep, Nov 14 '17 at 11:56

Sundeep · Accepted Answer · 2017-11-14T11:54:32.393

One way is to use flags

$ awk '/ghi/ && p~/abc/{f=1} f; /def/{f=0} {p=$0}' test.txt
ghi
78
def

{p=$0} this will save input line for future use
/ghi/ && p~/abc/{f=1} set flag if current line contains ghi and previous line contains abc
f; print input record as long as flag is set
/def/{f=0} clear the flag if line contains def

If you only want the lines between these two boundaries

$ awk '/ghi/ && p~/abc/{f=1; next} /def/{f=0} f; {p=$0}' ip.txt
78
$ awk '/12/ && p~/abc/{f=1; next} /def/{f=0} f; {p=$0}' ip.txt
34

See also How to select lines between two patterns?

score 0 · Answer 2 · answered Nov 14 '17 at 13:36

This is not really clean, but you can redefine your record separator as a regular expression to be abc\nghi\n|\ndef. This however creates multiple records, and you need to keep track which ones are between the correct ones. With awk you can check which RS was found using RT.

awk 'BEGIN{RS="abc\nghi\n|\ndef"}
     (RT~/abc/){s=1}
     (s==1)&&(RT~/def/){print $0}
     {s=0}' file

This does :

set RS to abc\nghi\n or \ndef.
check if the record is found, if RT contains abc you found the first one.
if you found the first one and the next RT contains def, then print.

score 0 · Answer 3 · answered Nov 14 '17 at 14:28

0

grep alternative

$ grep -Pazo '(?s)(?<=abc\nghi)(.*)(?=def)' file

but I think awk will be better

answered Nov 14 '17 at 14:28

karakfa

66,216
7
41
56

GNU grep only. The `-P` option doesn't work in the BSDs (incl macOS), though `pcregrep` is often available as an add-on package. – ghoti Nov 16 '17 at 03:36

score 0 · Answer 4 · answered Nov 16 '17 at 03:52

You could do this with sed. It's not ideal in that it doesn't actually understand records, but it might work for you...

sed -Ene 'H;${x;s/.*\nabc\nghi\n([0-9]+)\ndef\n.*/\1/;p;}' input.txt

Here's what's basically going on:

H - appends the current line to sed's "hold space"
${ - specifies the start of a series of commands that will be run once we come to the end of the file
x - swaps the hold space with the pattern space, so that future substitutions will work on what was stored using H
s/../../ - analyses the pattern space (which is now multi-line), capturing the data specified in your question, replacing the entire pattern space with the bracketed expression...
p - prints the result.

One important factor here is that the regular expression is ERE, so the -E option is important. If your version of sed uses some other option to enable support for ERE, then use that option instead.

Another consideration is that the regex above assumes Unix-style line endings. If you try to process a text file that was generated on DOS or Windows, the regex may need to be a little different.

score -1 · Answer 5 · answered Nov 14 '17 at 11:09

-1

awk solution:

awk '/ghi/ && r=="abc"{ f=1; n=NR+1 }f && NR==n{ v=$0 }v && NR==n+1{ print v }{ r=$0 }' file

The output:

Bonus GNU awk approach:

awk -v RS= 'match($0,/\nabc\nghi\n(.+)\ndef/,a){ print a[1] }' file

answered Nov 14 '17 at 11:09

RomanPerekhrest

88,541
4
65
105

1

downvote without a comment doesn't give much point/meaning for possible answer improvement. Therefore, such downvote is pointless – RomanPerekhrest Nov 15 '17 at 12:00

Looks for patterns across different lines

5 Answers5