5

I have a awk script from this example:

awk '/START/{if (x) print x; x="";}{x=(!x)?$0:x","$0;}END{print x;}' file

Here's a sample file with lines:

$ cat file
START
1
2
3
4
5
end
6
7
START
1
2
3
end
5
6
7

So I need to stop concatenating when destination string would contain end word, so the desired output is:

START,1,2,3,4,5,end
START,1,2,3,end
d.ansimov
  • 2,131
  • 2
  • 31
  • 54

5 Answers5

8

Short Awk solution (though it will check for /end/ pattern twice):

awk '/START/,/end/{ printf "%s%s",$0,(/^end/? ORS:",") }' file

The output:

START,1,2,3,4,5,end
START,1,2,3,end

  • /START/,/end/ - range pattern

A range pattern is made of two patterns separated by a comma, in the form ‘begpat, endpat’. It is used to match ranges of consecutive input records. The first pattern, begpat, controls where the range begins, while endpat controls where the pattern ends.

  • /^end/? ORS:"," - set delimiter for the current item within a range
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
  • 2
    Never use range expressions as they make trivial scripts very slightly briefer or require complete rewrites or duplicate conditions (e.g. testing for "end" twice in this case) when the requirements get just the tiniest bit more interesting. Always use a flag variable instead, e.g. https://stackoverflow.com/a/47796423/1745001 – Ed Morton Dec 13 '17 at 15:50
  • @EdMorton, yes, I realize that it will check for `/end/` pattern twice. Let it just be an alternative approach in addition to "flag"-based approach. I won't call my suggested approach as the only best one ... the other answers are good enough – RomanPerekhrest Dec 13 '17 at 17:18
4

here is another awk

$ awk '/START/{ORS=","} /end/ && ORS=RS; ORS!=RS' file

START,1,2,3,4,5,end
START,1,2,3,end

Note that /end/ && ORS=RS; is shortened form of /end/{ORS=RS; print}

karakfa
  • 66,216
  • 7
  • 41
  • 56
2

You can use this awk:

awk '/START/{p=1; x=""} p{x = x (x=="" ? "" : ",") $0} /end/{if (x) print x; p=0}' file

START,1,2,3,4,5,end
START,1,2,3,end
anubhava
  • 761,203
  • 64
  • 569
  • 643
2

Another way, similar to answers in How to select lines between two patterns?

$ awk '/START/{ORS=","; f=1} /end/{ORS=RS; print; f=0} f' ip.txt
START,1,2,3,4,5,end
START,1,2,3,end
  • this doesn't need a buffer, but doesn't check if START had a corresponding end
  • /START/{ORS=","; f=1} set ORS as , and set a flag (which controls what lines to print)
  • /end/{ORS=RS; print; f=0} set ORS to newline on ending condition. Print the line and clear the flag
  • f print input record as long as this flag is set
Sundeep
  • 23,246
  • 2
  • 28
  • 103
0

Since we seem to have gone down the rabbit hole with ways to do this, here's a fairly reasonable approach with GNU awk for multi-char RS, RT, and gensub():

$ awk -v RS='end' -v OFS=',' 'RT{$0=gensub(/.*(START)/,"\\1",1); $NF=$NF OFS RT; print}' file
START,1,2,3,4,5,end
START,1,2,3,end
Ed Morton
  • 188,023
  • 17
  • 78
  • 185