8

I am new to scripting and was trying to learn how to extract any text that exists between two different patterns. However, I am still not able to figure out how to extract text between two patterns in the following scenario:

If I have my input file reading:

Hi I would like
to print text
between these 
patterns

and my expected output is like:

I would like
to print text
between these 

i.e. my first search pattern is "Hi' and skip this pattern, but print everything that exists in the same line following that matched pattern. My second search pattern is "patterns" and I would like to completely avoid printing this line or any lines beyond that.

I tried the following:

sed -n '/Hi/,/patterns/p' test.txt 

[output]

Hi I would like
to print text
between these 
patterns 

Next, I tried:

`awk ' /'"Hi"'/ {flag=1;next} /'"pattern"'/{flag=0} flag { print }'` test.txt 

[output]

to print text
between these

Can someone help me out in identifying how to achieve this? Thanks in advance

Amarnath Revanna
  • 521
  • 1
  • 3
  • 9

4 Answers4

7

You have the right idea, a mini-state-machine in awk but you need some slight mods as per the following transcript:

pax> echo 'Hi I would like
to print text
between these 
patterns ' | awk '
    /patterns/ { echo = 0 }
    /Hi /      { gsub("^.*Hi ", "", $0); echo = 1 }
               { if (echo == 1) { print } }'

Or, in compressed form:

awk '/patterns/{e=0}/Hi /{gsub("^.*Hi ","",$0);e=1}{if(e==1){print}}'

The output of that is:

I would like
to print text
between these 

as requested.

The way this works is as follows. The echo variable is initially 0 meaning that no echoing will take place.

Each line is checked in turn. If it contains patterns, echoing is disabled.

If it contains Hi followed by a space, echoing is turned on and gsub is used to modify the line to get rid of everything up to the Hi.

Then, regardless, the line (possibly modified) is echoed when the echo flag is on.

Now, there's going to be edge cases such as:

  • lines containing two occurrences of Hi; or
  • lines containing something before the patterns.

You haven't specified how they should be handled so I didn't bother, but the basic concept should be the same.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • Thanks a lot for the answer and detailed description paxdiablo, your soln. works like a charm :-). In my case, I don't have repeated occurrences of pattern words along the same line nor any words occurring before "patterns". In my scenario, I can always identify the line starting which I would like to discard everything and this line always starts with the same pattern. Thanks again for the response, much appreciated :-) – Amarnath Revanna Oct 23 '12 at 04:54
  • A few issues: 1) "^.*Hi" is the same as "Hi" in an RE, 2) You don't need to specify $0 as the third arg for *sub(), 3) You don't need a gsub() when you just want to replace 1 occurrence, and 4) "{ if (echo == 1) { print } }" is equivalent to just "echo" on it's own. – Ed Morton Oct 23 '12 at 18:15
  • Ed. Re 1, no, it's not, not when you're substituting - the difference is between subbing just the hi or everything on the line up to and including the hi. Other points are valid though mostly stylistic. – paxdiablo Oct 23 '12 at 23:38
3

Updated the solution to remove the line "patterns" :

$ sed -n '/^Hi/,/patterns/{s/^Hi //;/^patterns/d;p;}' file
I would like
to print text
between these
Guru
  • 16,456
  • 2
  • 33
  • 46
2

This might work for you (GNU sed):

sed '/Hi /!d;s//\n/;s/.*\n//;ta;:a;s/patterns.*$//;tb;$!{n;ba};:b;/^$/d' file
potong
  • 55,640
  • 6
  • 51
  • 83
1

Just set a flag (f) when you find+replace Hi at the start of a line, clear it when you find patterns, then invoke the default print when the flag is set:

$ awk 'sub(/^Hi /,""){f=1} /patterns/{f=0} f'  file
I would like
to print text
between these
Ed Morton
  • 188,023
  • 17
  • 78
  • 185