grep or sed to match a block between start and end using same pattern

Question

I have a file with the following info:

start pattern1
line1
line2
...
end pattern1
line3
line4
start pattern2
...

my output should be:
start pattern1 line1 line2 end pattern1

If I know what pattern1 is , I can do

sed '/start pattern1/,/end pattern1/p' <file>

but here, I want to match pattern1 (like \S+ in perl regex) and use the same (like $1) in the end. How can I do that?

What is the expected output, could you please mention that also in your post. — RavinderSingh13, Aug 15 '17 at 08:07
You still can use `sed -n '/pattern1/,/pattern1/p' input` for your request — CWLiu, Aug 15 '17 at 08:30
Could you give a real example? Are the words `start` and `end` present? Or only `patern`? — Toto, Aug 15 '17 at 13:26
Never use range expressions, always use a flag instead. That means you can't use sed of course - see https://stackoverflow.com/a/17914105/1745001 and https://stackoverflow.com/q/23934486/1745001 for how to print text between conditions. — Ed Morton, Aug 15 '17 at 18:27
to add clarification to my question: 'start' and 'end' are prefix keywords to an unknown `pattern1` that i would like to match. — user2623661, Aug 16 '17 at 23:33

zdim · Accepted Answer · 2017-08-18T19:18:10.337

4

With the range operator in Perl, patterns aren't tested at the same time

perl -wnE'print if /start ([A-Za-z0-9_:]+)/ ... /end $1/' intput.txt

Updated to the actual pattern, specified in comments.

I tested using captures (in a do block instead of just print) and it worked but problems may lie in wait if there are other captures. If you don't capture anything in some other regex this works.

Note the use of ... instead of .., to not test the right operand until the next evaluation.

edited Aug 18 '17 at 19:18

answered Aug 15 '17 at 08:19

zdim

64,580
5
52
81

I tried this as you have written it (without do) and it doesnt seem to work. It prints the whole file. – user2623661 Aug 16 '17 at 23:30
@user2623661 I just tried again (it was tested when I posted), copy-pasting the one-liner and running it on a file like your example, with lines added before and after. It works. What is your real input like? – zdim Aug 16 '17 at 23:33
@user2623661 Hang on ... I noticed your comment, that `pattern1` is "_unknown_" -- what do you mean by that? What kind of a thing is it? Is it just anything that follows "start" (or your actual "_prefix keyword_"), or is there some rule to what it can be? – zdim Aug 16 '17 at 23:36
it is a 'word' meaning no space and has some special characters [A-Za-z0-9_:] – user2623661 Aug 17 '17 at 00:14
@user2623661 Thanks, updated, works in my test. I added multiple such sections to my test file, with other lines before, between, and after. – zdim Aug 17 '17 at 00:20

potong · Answer 2 · 2017-08-15T22:22:25.707

2

This might work for you (GNU sed):

sed -n '/pattern/,//p' file

This invokes a range which is a flip-flop match, the empty // matches the last regexp. The p prints everything while the range switch is true.

N.B. the -n invokes seds grep-like nature and turns off the automatic printing.

An alternative:

sed '/pattern/!d;:a;n;//!ba' file

edited Aug 15 '17 at 22:22

answered Aug 15 '17 at 16:54

potong

55,640
6
51
83

+1. `//` is the sed magic here. That said, it would break with any substitution or matching in the range. You'd have to pipe sed to sed if you wanted that. – stevesliva Aug 15 '17 at 17:26
i am guessing this wont work with my case since i have a 'start' and 'end' prefix to 'pattern' – user2623661 Aug 16 '17 at 23:32

score 1 · Answer 3 · answered Aug 15 '17 at 08:18

1

Using awk to print between pattern1s (inclusive):

$ awk '/pattern1/{p=!p;print;next} p' file
pattern1
line1
line2
...
pattern1

The regex could be defined better, like /^pattern1$/ or $0=="pattern1".

answered Aug 15 '17 at 08:18

James Brown

36,089
7
43
59

RavinderSingh13 · Answer 4 · 2017-08-15T09:21:49.997

0

try following solution too with awk and let me know if this helps you.

awk -v RS="" '{match($0,/start pattern1.*start pattern1/);print substr($0,RSTART,RLENGTH)}'  Input_file

EDIT: OP haven't shown like Input_file could have empty lines too, as per CWLiu, I am adding a suggestion which will work in case of any empty lines too.

awk '/start pattern1/{print;getline;while($0 !~ /start pattern1/){print;getline};print}' Input_file

edited Aug 15 '17 at 09:21

answered Aug 15 '17 at 08:34

RavinderSingh13

130,504
14
57
93

This doesn't work if empty lines are included between pattern1 – CWLiu Aug 15 '17 at 08:47
1

@CWLiu: OP haven't mentioned that, not an issue. I have added one more solution to deal with empty lines also. – RavinderSingh13 Aug 15 '17 at 09:22

score 0 · Answer 5 · answered Aug 15 '17 at 15:36

So, here's an awk implementation based on an alternative interpretation of your question (since it's not quite clear).

If you want to detect the pattern1 from the first line that starts with start, and then print every line until the end pattern1, you can do it like this:

$ awk '/^start / {pat=$2; next}  /^end / && $2~pat {exit}  {print}' file 
line1
line2
...

grep or sed to match a block between start and end using same pattern

5 Answers5