21

I have a file that, occasionally, has split lines. The split is signaled by the fact that the line starts with '+' (possibly preceeded by spaces).

line 1
line 2
  + continue 2
line 3
...

I'd like join the split line back:

line 1
line 2 continue 2
line 3
...

using sed. I'm not clear how to join a line with the preceeding one.

Any suggestion?

stevesliva
  • 5,351
  • 1
  • 16
  • 39
Remo.D
  • 16,122
  • 6
  • 43
  • 74

6 Answers6

28

This might work for you:

sed 'N;s/\n\s*+//;P;D' file

These are actually four commands:

  • N
    Append line from the input file to the pattern space
  • s/\n\s*+//
    Remove newline, following whitespace and the plus
  • P
    print line from the pattern space until the first newline
  • D
    delete line from the pattern space until the first newline, e.g. the part which was just printed

The relevant manual page parts are

To add 1 or more + lines, use:

sed ':a;N;s/\n\s*+//;ta;P;D' file
potong
  • 55,640
  • 6
  • 51
  • 83
  • 1
    Nice, this even works in non-GNU sed if you replace `\s` with a space! +1. – ghoti Apr 03 '12 at 22:25
  • 1
    @AquariusPower, yes, that will match spaces, but it will also match tabs, which of course potong's solution of `\s` matches as well. The OP stated that a continuation was denoted by a *`+` possible preceded by spaces*, but he said nothing of tabs. Probably doesn't matter, but you never know. – ghoti Nov 22 '15 at 03:52
  • @ghoti I proposed that as I had many troubles with a single " ", while later I found that matching all blanks helped on preventing many re-codings, exactly as "we never know", as you said :) – Aquarius Power Nov 22 '15 at 04:04
  • 1
    Since I just needed this, I also wanted to understand how it works, therefore the edit. If you don't like the change, feel free to revert. – Olaf Dietsche Mar 22 '18 at 12:53
  • Why not just `sed 'N;s/\n\s*+//'`? – xebeche Aug 07 '18 at 07:46
  • 1
    @xebeche by using `P` followed by `D` the pattern space maintains a two line window from the start to the end of the file. If the continuation line always appears on a even numbered line then your solution would work but it would just print the file as is in the case of continuation lines on odd numbered lines and mixture of the two if they appeared anywhere. Try your solution using the above test data. – potong Aug 07 '18 at 08:28
  • Thanks! Never really understood how `N`, `P` + `D` work together so nicely, it's almost magic. Got it now. BTW, you edited the answer + removed `$!` but it is indeed required b/c `D` becomes `d` if pattern space contains no newline. – xebeche Aug 07 '18 at 16:51
  • @xebeche I removed the `$!' because it is superfluous. Try and run it against a test filea and you will see all edges are covered. – potong Aug 07 '18 at 16:59
  • I know now, sorry. It is superfluous with `sed`, however, it is not with `sed -n`. – xebeche Aug 07 '18 at 17:05
  • 1
    @stevesliva to handle 1 or more + after a starting line use `sed ':a;N;s/\n\s*+//;ta;P;D' file` – potong Jul 20 '23 at 23:42
4

Doing this in sed is certainly a good exercise, but it's pretty trivial in perl:

perl -0777 -pe 's/\n\s*\+//g' input
William Pursell
  • 204,365
  • 48
  • 270
  • 300
4

Different use of hold space with POSIX sed... to load the entire file into the hold space before merging lines.

sed -n '1x;1!H;${g;s/\n\s*+//g;p}'

  • 1x on the first line, swap the line into the empty hold space
  • 1!H on non-first lines, append to the hold space
  • $ on the last line:
    • g get the hold space (the entire file)
    • s/\n\s*+//g replace newlines preceeding +
    • p print everything

Input:

line 1
line 2
  + continue 2
  + continue 2 even more
line 3
+ continued

becomes

line 1
line 2 continue 2 continue 2 even more
line 3 continued

This (or potong's answer) might be more interesting than a sed -z implementation if other commands were desired for other manipulations of the data you can simply stick them in before 1!H, while sed -z is immediately loading the entire file into the pattern space. That means you aren't manipulating single lines at any point. Same for perl -0777.

In other words, if you want to also eliminate comment lines starting with *, add in /^\s*\*/d to delete the line

sed -n '1x;/^\s*\*/d;1!H;${g;s/\n\s*+//g;p}'

versus:

sed -z 's/\n\s*+//g;s/\n\s*\*[^\n]*\n/\n/g'

The former's accumulation in the hold space line by line keeps you in classic sed line processing land, while the latter's sed -z dumps you into what could be some painful substring regexes.

But that's sort of an edge case, and you could always just pipe sed -z back into sed. So +1 for that.

Footnote for internet searches: This is SPICE netlist syntax.

stevesliva
  • 5,351
  • 1
  • 16
  • 39
3

I'm not partial to sed so this was a nice challenge for me.

sed -n '1{h;n};/^ *+ */{s// /;H;n};{x;s/\n//g;p};${x;p}'

In awk this is approximately:

awk '
    NR == 1 {hold = $0; next}
    /^ *\+/ {$1 = ""; hold=hold $0; next}
    {print hold; hold = $0}
    END {if (hold) print hold}
'

If the last line is a "+" line, the sed version will print a trailing blank line. Couldn't figure out how to suppress it.

glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • Note that this is GNU-sed-only. The awk version is way more readable of course, but it also suffers because when you `$1 = "";`, you tell awk to rewrite $0 with its default OFS. That may not be important, but it should be remembered in case someone wants to use this solution. – ghoti Apr 03 '12 at 22:22
3

You can use Vim in Ex mode:

ex -sc g/+/-j -cx file
  1. g global search

  2. - select previous line

  3. j join with next line

  4. x save and close

2

A solution for versions of sed that can read NUL separated data, like here GNU Sed's -z:

sed -z 's/\n\s*+//g'

Compared to potong's solution this has the advantage of being able to join multiple lines that start with +. For example:

line 1
line 2
  + continue 2
  + continue 2 even more
line 3

becomes

line 1
line 2 continue 2 continue 2 even more
line 3
xebeche
  • 905
  • 7
  • 20