awk concatenate strings till contain substring

Question

I have a awk script from this example:

awk '/START/{if (x) print x; x="";}{x=(!x)?$0:x","$0;}END{print x;}' file

Here's a sample file with lines:

$ cat file
START
1
2
3
4
5
end
6
7
START
1
2
3
end
5
6
7

So I need to stop concatenating when destination string would contain end word, so the desired output is:

START,1,2,3,4,5,end
START,1,2,3,end

Nevertheless, 5 users gave me an answer to it. So, it seems like it's OK for most of people, if it doesn't contain the question mark. — d.ansimov, Dec 30 '17 at 18:07

RomanPerekhrest · Accepted Answer · 2017-12-13T17:20:55.127

8

Short Awk solution (though it will check for /end/ pattern twice):

awk '/START/,/end/{ printf "%s%s",$0,(/^end/? ORS:",") }' file

The output:

START,1,2,3,4,5,end
START,1,2,3,end

/START/,/end/ - range pattern

A range pattern is made of two patterns separated by a comma, in the form ‘begpat, endpat’. It is used to match ranges of consecutive input records. The first pattern, begpat, controls where the range begins, while endpat controls where the pattern ends.

/^end/? ORS:"," - set delimiter for the current item within a range

edited Dec 13 '17 at 17:20

answered Dec 13 '17 at 15:14

RomanPerekhrest

88,541
4
65
105

2

Never use range expressions as they make trivial scripts very slightly briefer or require complete rewrites or duplicate conditions (e.g. testing for "end" twice in this case) when the requirements get just the tiniest bit more interesting. Always use a flag variable instead, e.g. https://stackoverflow.com/a/47796423/1745001 – Ed Morton Dec 13 '17 at 15:50
@EdMorton, yes, I realize that it will check for `/end/` pattern twice. Let it just be an alternative approach in addition to "flag"-based approach. I won't call my suggested approach as the only best one ... the other answers are good enough – RomanPerekhrest Dec 13 '17 at 17:18

karakfa · Answer 2 · 2017-12-13T16:47:24.193

4

here is another awk

$ awk '/START/{ORS=","} /end/ && ORS=RS; ORS!=RS' file

START,1,2,3,4,5,end
START,1,2,3,end

Note that /end/ && ORS=RS; is shortened form of /end/{ORS=RS; print}

edited Dec 13 '17 at 16:47

answered Dec 13 '17 at 16:09

karakfa

66,216
7
41
56

score 2 · Answer 3 · answered Dec 13 '17 at 15:14

2

You can use this awk:

awk '/START/{p=1; x=""} p{x = x (x=="" ? "" : ",") $0} /end/{if (x) print x; p=0}' file

START,1,2,3,4,5,end
START,1,2,3,end

answered Dec 13 '17 at 15:14

anubhava

761,203
64
569
643

score 2 · Answer 4 · answered Dec 13 '17 at 16:10

Another way, similar to answers in How to select lines between two patterns?

$ awk '/START/{ORS=","; f=1} /end/{ORS=RS; print; f=0} f' ip.txt
START,1,2,3,4,5,end
START,1,2,3,end

this doesn't need a buffer, but doesn't check if START had a corresponding end
/START/{ORS=","; f=1} set ORS as , and set a flag (which controls what lines to print)
/end/{ORS=RS; print; f=0} set ORS to newline on ending condition. Print the line and clear the flag
f print input record as long as this flag is set

score 0 · Answer 5 · answered Dec 13 '17 at 16:33

Since we seem to have gone down the rabbit hole with ways to do this, here's a fairly reasonable approach with GNU awk for multi-char RS, RT, and gensub():

$ awk -v RS='end' -v OFS=',' 'RT{$0=gensub(/.*(START)/,"\\1",1); $NF=$NF OFS RT; print}' file
START,1,2,3,4,5,end
START,1,2,3,end

awk concatenate strings till contain substring

5 Answers5