Remove comma from last element in each block

Question

I've got a file with the following contents, and want to remove the last comma (in this case, the comma after the 'c' and 'f').

heading1(
a,
b,
c,
);

some more text

heading2(
d,
e,
f,
);

This has to be used using bash and not Perl or Python etc as these are not installed on my target system. I can use sed, awk etc, but I cannot use sed with the -z argument as I'm using an old version of the utility.

So sed -zi 's/,\n);/\n);/g' $file is off the table.

Any help would be greatly appreciated. Thanks

Beta · Accepted Answer · 2022-02-17T06:22:54.250

1

This might work in your version of sed. Then again it might not.

sed 'x;1d;G;/;$/s/,//;$!s/\n.*//' $file

Rough translation: "Swap this line with the hold space. If this is the first line, do no more with it. Append the hold space to the line in the buffer (so that you're looking at the last line and the current one). If what you have ends with a semicolon, delete the comma. If you're not on the last line of the file, delete the second of the two lines you have (i.e. the current line, which we'll deal with after we see the next one)."

edited Feb 17 '22 at 06:22

answered Feb 17 '22 at 05:46

Beta

96,650
16
149
150

This seems to ALMOST work. It removes the last 2 commas from the last section. – sharkbites Feb 17 '22 at 05:56
Correction, it duplicates the very last element. In my example, it's like ``` heading1( a, b, c ); some more text heading2( d, e, f f ); – sharkbites Feb 17 '22 at 06:14
@sharkbites: [facepalm] How did I not notice that? Editing... – Beta Feb 17 '22 at 06:20

score 1 · Answer 2 · answered Feb 17 '22 at 06:41

1

Using awk, RS="^$" to read in the whole file and regex to replace parts of the text:

$ awk -v RS=^$ '{gsub(/,\n\);/,"\n);")}1' file

Some output:

heading1(
a,
b,
c
);
...

answered Feb 17 '22 at 06:41

James Brown

36,089
7
43
59

Martin Kealey · Answer 3 · 2022-10-21T13:07:08.400

Using sed there are broadly two approaches:

Keep multiple lines in the pattern space; or
Keep the previous line in the hold space.

Using just the pattern space means a very concise version:

sed 'N; s/,[[:space:]]*\n*[[:space:]]*)/)/; P; D'

This relies on the pattern space being able to hold multiple lines, and being able to match the newline with \n. Not all versions of sed can do this, but GNU sed can.

This also relies on the implicit behaviours of N, P, and D, which change depending on when end-of-input is reached. Read man sed for the gory details.

Unrolling this to one command per line gets:

sed '
    N
    s/,[[:space:]]*\n*[[:space:]]*)/)/
    P
    D
'

If you have only a POSIX version of sed available, you'll need to use the hold space as well. In this case the idea is that when you see the ) in the pattern space, you edit the line that's in the hold space to remove the comma:

  sed '1 { h; d; }; /^)/ { x; s/,[[:space:]]*$//; x; }; x; $ { p; x; s/,$//; }'

Unrolling that we get:

  sed '
    1 {
      h
      d
    }
    /^)/ {
      x
      s/,[[:space:]]*$//
      x
    }
    x
    $ {
      p
      x
      s/,[[:space:]]*$//
    }
  '

Breaking that apart: what follows is a "sed script"; so just put '' around it and "sed" in front of it:

  sed '

Start by unconditionally copying the first line from the pattern space to the hold space, and then deleting the pattern space (which forces a skip to the next line)

For each line that starts with ')', swap the pattern space and hold space (so you now have the previous line in the pattern space), remove the trailing comma (if any), and then swap back again:

    /^)/ {
      x
      s/,[[:space:]]*$//
      x
    }

Now swap the pattern space with the hold space, so that the hold space now hold the current line and pattern space holds the previous line.

Normally contents of the pattern space will be sent to output when the end of the script is reached, but we have one more case to take care of first.

On the last line, print the previous line, then swap to retrieve the last line and then (because we reach the end of the script) print it too. This code will also remove a trailing comma from the last line, but that's optional; you can remove the s command in the following if you don't want that.

    $ {
      p
      x
      s/,[[:space:]]*$//
    }

Upon reaching the end of the sed script, the pattern space will be printed; so there's no "p" at the end.

As mentioned before, close the quote from the beginning.

Note: If you need to scan ahead more than one line, instead of "x" to swap one line, use "H;g" to append to the hold space and then copy the hold space to the pattern space, then "P;D" to print and remove up to the first newline. (H, P & D are GNU extensions.)

I tried both approaches and they did not work for me. What am I missing? I have tried all the answers in this post. My file is: CREATE TABLE table1 ( col1 bigint DEFAULT 0 NOT NULL, col2 smallint, col3 bigint, ) WITH (fillfactor='70'); CREATE TABLE table2 ( cola bigint , colb character varying(19), ); — GordyCA, Oct 18 '22 at 17:17
@GordyCA do you perhaps have something other than a line break between the comma and the close bracket? Are the lines indented? Is there whitespace on the ends of lines? — Martin Kealey, Oct 21 '22 at 12:58

Renaud Pacalet · Answer 4 · 2022-02-17T06:05:00.927

0

This should work with GNU sed and BSD sed on the shown input:

sed -e ':a' -e '/,\n);$/!{N' -e 'ba' -e '}' -e 's/,\n);$/\n);/' file.txt

We concatenate lines in the pattern space until it ends with ,\n);. Then we delete the comma, print (the default) and restart the cycle with a new line.

Simpler and more readable version with GNU sed (that you do not have):

sed ':a;/,\n);$/!{N;ba};s/,\n);$/\n);/' file.txt

edited Feb 17 '22 at 06:05

answered Feb 17 '22 at 05:59

Renaud Pacalet

25,260
3
34
51

Hmmm I'm not sure why but the first example didn't seem to work for me. – sharkbites Feb 17 '22 at 06:22
What does _didn't seem to work for me_ means? I just tested on the exact input example you provided and it worked as expected with both sed. – Renaud Pacalet Feb 17 '22 at 06:24
I also did, and tripple checked I got all the characters right. I'm getting: heading1(a,b,c,dn); heading2(e,f,g,hn); (and before you ask - yes, I escaped the n's ). I guess it just doesn't work with my version of sed – sharkbites Feb 17 '22 at 06:30
Interesting. What version of sed is this? I don't understand how the (excellent) solution you accepted, and that also uses `\n`, works, and I'd like to understand. – Renaud Pacalet Feb 17 '22 at 06:51
The first solution works for me too, but notice that my solution uses `\n` in the search pattern, while this solution uses it in the replacement string. I remember a version of sed I used to have that could handle the one but not the other. I'll bet this solution would work for the OP if the `\n` were replaced with an actual line feed in the middle of the sed script. (I hated that version of sed for this very reason.) – Beta Feb 17 '22 at 12:45
@Beta Wow! Thanks for the explanation. Never thought that such a weird `sed` could exist. – Renaud Pacalet Feb 17 '22 at 12:52

score 0 · Answer 5 · answered Feb 17 '22 at 06:15

0

Using awk:

awk '
$0==");" {sub(/,$/, "", l)}
FNR!=1 {print l}
{l=$0}
END {print l}'

answered Feb 17 '22 at 06:15

dan

4,846
6
15

score 0 · Answer 6 · answered Feb 18 '22 at 14:13

This might work for you (GNU sed):

sed '/,$/{N;/);$/Ms/,$//M;P;D}' file

If a line ends with a comma, fetch the next line and if this ends in );, remove the comma.

Otherwise, if the following line does not match as above, print/delete the first of the lines and repeat.

Remove comma from last element in each block

6 Answers6