How to extract text between two words in unix?

Question

I
am
using
basic
sed
expression :-

sed -n "am/,/sed/p"

to get the text between "am" and "sed" which will output "am \n using \n basic \n sed". But my real problem is if the string would be :-

I
am
using
basic
grep
expression.

I applied the above sed in this sentence then it gave "am \n using \n basic \n grep \n expression" which it should not give it. How to discard the output if there would be no matching?

Any suggestions?

What's with all the '`\n`' characters you so carefully added in your edit? They simply make your question illegible - but I can't quite work out what they're for, so it is hard to say whether simply deleting them is sensible... — Jonathan Leffler, May 25 '11 at 17:20
I am a newbie, will try to make it eligible. You can always tell the tricks to make the question more readable. Thanks for suggestion. — crazy_prog, May 25 '11 at 17:36
Amitesh: it's fine to be a newbie - we all were once. I'm just confused about what you're trying to achieve with the '`\n`' notation. It looks as if you are thinking "if you take this question as input, ...", but I'm not sure. Otherwise, I'd simply delete all occurrences of '`\n`', not least because the sequence has a meaning in `sed` scripts which I don't think you are after. — Jonathan Leffler, May 25 '11 at 18:09
See [Print text between delimiters on multiple lines using sed](http://stackoverflow.com/questions/5972908/print-text-between-delimiters-on-multiple-lines-using-sed). That was dealing with text between '(' and ')', but there isn't all that much difference between it and what you're after. The main big difference is that the ')' is a single character, so a negated character class '`[^)]*`' handles 'skip over the uninteresting stuff. It isn't quite so simple with multi-character delimiters. — Jonathan Leffler, May 25 '11 at 18:15
Hey Jonathan, I meant to make the picture more clearer as there was newline between the words and it has importance in sed. — crazy_prog, May 25 '11 at 18:35
I admire your creativity; I am not sure that I find it easy to read, in either form. Sorry to rain on your parade, but straight-forward English is easiest for all to handle. — Jonathan Leffler, May 25 '11 at 20:32

bmk · Accepted Answer · 2011-05-25T16:54:16.827

13

The command in the question (sed -n "/am/,/sed/p", note the added slash) means:

Find a line containing the string am
and print (p) until a line containing sed occurs

Therefore it prints:

I am using basic grep expression

because it contains am. If you would add some more lines they will be printed, too, until a line containing sed occurs.

E.g.:

echo -e 'I am using basic grep expression.\nOne more line\nOne with sed\nOne without' | sed -n "/am/,/sed/p"

results in:

I am using basic grep expression.
One more line
One with sed

I think - what you want to do is something like that:

sed -n "s/.*\(am.*sed\).*/\1/p"

Example:

echo 'I am using basic grep expression.' | sed -n "s/.*\(am.*sed\).*/\1/p"

echo 'I am using basic sed expression.' | sed -n "s/.*\(am.*sed\).*/\1/p"
sed -n "s/.*\(am.*sed\).*/\1/p"

edited May 25 '11 at 16:54

answered May 25 '11 at 16:46

bmk

13,849
5
37
46

I have to be more detailed. Actually, there are multiple lines and we want to extract some lines between two words. If there would be a match of the "start word" and "stop word" then the above expression would be ok. But if there would be only the "start word" and no "stop word", it would output all the lines beyond the "start word". I hope you would get it know. Thanks. – crazy_prog May 25 '11 at 17:01
@amitesh: OK - in that case anubhava's answer is what you need. – bmk May 25 '11 at 17:10
is there way to get string between 2 words(it should not include itself), i.e am and sed should not be included in itself – Mehul Thakkar Oct 30 '13 at 04:09
also, your answer is giving only first match(or may be last match), but what if i want all the matches – Mehul Thakkar Oct 30 '13 at 04:10

anubhava · Answer 2 · 2011-05-25T17:52:02.463

3

You have to use slightly different sed command like:

sed -n '/am/{:a; /am/x; $!N; /sed/!{$!ba;}; /sed/{s/\n/ /gp;}}' file

To print ONLY lines that contain text am and sed spanned across multiple lines.

edited May 25 '11 at 17:52

answered May 25 '11 at 17:03

anubhava

761,203
64
569
643

The words are in different lines. Pls review the question. – crazy_prog May 25 '11 at 17:21
Can you please post a sample of your input file in your question? – anubhava May 25 '11 at 17:24
The first line of the question is the input file. \n represents newline. – crazy_prog May 25 '11 at 17:37

Noam Manos · Answer 3 · 2013-09-18T05:37:33.657

1

When Using SED this can work but it's quite an overwhelming syntax... if you need to crop part of a multi-line (\n) text, you might want to try a simpler way using grep:

cat multi_line.txt | grep -oP '(?s)(?<=START phrase).*(?=END phrase)'

For example, I find this as the easiest way to grab perforce changelist description (without rest of CL info):

p4 describe {CL NUMBER} | grep -oP '(?s).*(?=Affected files)'

Note, you can play with the <= and >= to include or not include, the starting/ending phrases in the output.

edited Sep 18 '13 at 05:37

answered Sep 17 '13 at 19:26

Noam Manos

15,216
3
86
85

`grep -P` is a non-standard extension which isn't generally portable, though it works on GNU `grep` and possibly some other versions. (macOS `grep` used to have this option, but removed it.) – tripleee Feb 09 '21 at 05:58

How to extract text between two words in unix?

3 Answers3

Linked