Sed to extract text between two strings

Question

Please help me in using sed. I have a file like below.

START=A
  xxxxx
  xxxxx
END
START=A
  xxxxx
  xxxxx
END
START=A
  xxxxx
  xxxxx
END
START=B
  xxxxx
  xxxxx
END
START=A
  xxxxx
  xxxxx
END
START=C
  xxxxx
  xxxxx
END
START=A
  xxxxx
  xxxxx
END
START=D
  xxxxx
  xxxxx
END

I want to get the text between START=A, END. I used the below query.

sed '/^START=A/, / ^END/!d' input_file

The problem here is , I am getting

START=A
  xxxxx
  xxxxx
END
START=D
  xxxxx
  xxxxx
END

instead of

START=A
  xxxxx
  xxxxx
END

Sed finds greedily.

Please help me in resolvng this.

Thanks in advance.

Can I use AWK for achieving above?

The other common FAQ is "from a line containing two tokens, how do I extract the text between them"; this is https://stackoverflow.com/questions/13242469/how-to-use-sed-grep-to-extract-text-between-two-words — tripleee, Jul 30 '20 at 05:42

Jonathan Leffler · Accepted Answer · 2013-05-20T06:09:02.357

27

sed -n '/^START=A$/,/^END$/p' data

The -n option means don't print by default; then the script says 'do print between the line containing START=A and the next END.

You can also do it with awk:

A pattern may consist of two patterns separated by a comma; in this case, the action is performed for all lines from an occurrence of the first pattern though an occurrence of the second.

(from man awk on Mac OS X).

awk '/^START=A$/,/^END$/ { print }' data

Given a modified form of the data file in the question:

START=A
  xxx01
  xxx02
END
START=A
  xxx03
  xxx04
END
START=A
  xxx05
  xxx06
END
START=B
  xxx07
  xxx08
END
START=A
  xxx09
  xxx10
END
START=C
  xxx11
  xxx12
END
START=A
  xxx13
  xxx14
END
START=D
  xxx15
  xxx16
END

The output using GNU sed or Mac OS X (BSD) sed, and using GNU awk or BSD awk, is the same:

START=A
  xxx01
  xxx02
END
START=A
  xxx03
  xxx04
END
START=A
  xxx05
  xxx06
END
START=A
  xxx09
  xxx10
END
START=A
  xxx13
  xxx14
END

Note how I modified the data file so it is easier to see where the various blocks of data printed came from in the file.

If you have a different output requirement (such as 'only the first block between START=A and END', or 'only the last ...'), then you need to articulate that more clearly in the question.

edited May 20 '13 at 06:09

answered May 20 '13 at 05:51

Jonathan Leffler

730,956
141
904
1,278

1

Thanks for the reply. I need text between START=A and the next END, the above one gives data between START=A and last END. Hope you got my prob. – ranganath111 May 20 '13 at 05:58
No, it doesn't. Both the `awk` and the `sed` scripts — at least on my machine with my copy of the data file you provided — print 5 blocks of data between `START=A` and `END`, and the blocks with `START=B` to `END`, `START=C` to `END` and `START=D` to `END` are all omitted from the output. Which platform are you testing on? Which version of `sed` are you using? Which version of `awk` are you using? (I note that your test data repeats verbatim the blocks between `START=A` and `END`. It would be much better if you had different lines in between so you could see which lines are being printed.) – Jonathan Leffler May 20 '13 at 06:00
When I test this, the start and end toeks are included in the output, while I had the impression the OP wanted only data BETWEEN them. – Mr. Developerdude Sep 20 '16 at 21:35
1

@LennartRolland: The sample desired output specifically includes the `START=A` and `END` lines. If you don't want the start and end markers to appear, you can use `sed` like this: `sed -n -e '/^START=A$/,/^END$/ { /^START=A$/d; /^END$/d; p; }'`. Or, you can use `awk` like this: `awk '/^START=A$/,/^END$/ { if ($0 != "START=A" && $0 != "END") print }'` (same basic idea, though you can code the condition in a number of different ways if desired) – Jonathan Leffler Sep 20 '16 at 22:08

xagyg · Answer 2 · 2013-05-20T06:23:51.440

4

Basic version ...

sed -n '/START=A/,/END/p' yourfile

More robust version...

sed -n '/^ *START=A *$/,/^ *END *$/p' yourfile

edited May 20 '13 at 06:23

answered May 20 '13 at 06:15

xagyg

9,562
2
32
29

can you explain what `,` means in sed pattern string? – Vikrant Singh Jul 08 '16 at 06:30
@Vikrant - the `,` separates two parts of a *range* defined by two regexes so that the lines between the first pattern and the second pattern are returned. – starfry Aug 19 '16 at 09:23

abasu · Answer 3 · 2013-05-20T06:21:53.027

2

Your sed expression has a space before end, i.e / ^END/. So sed gets the starting pattern, but does not get the ending pattern and keeps on printing till end. Use sed '/^START=A/, /^END/!d' input_file (notice /^END/)

edited May 20 '13 at 06:21

answered May 20 '13 at 06:07

abasu

2,454
19
22

Good point about the space in the `sed` regex, though it makes the quoted output even more puzzling (as in 'I cannot reproduce the quoted output with the original script, but drop the extraneous space and it works fine, albeit cackhanded'). You can at least simplify the last part of your `awk` script to `/END/{flag=0}` which might set flag to zero when it was already zero, but that does no harm. You can also use `/START=A/,/END/{print}` which is much simpler. – Jonathan Leffler May 20 '13 at 06:14
yea, `/START=A/,/END/{print}` this is much simpler, but it's already shown in your answer :) I was just playing around with a flag :). Actually, after the `awk` solution you have given, he does not need to do anything else. I'll remove my `awk` solution. It might lead to more confusion than doing any good :P – abasu May 20 '13 at 06:21

Sed to extract text between two strings

3 Answers3

Linked

Related