Replacing a specific whitespace pattern in sed with a newline, when it doesn't have a preceding colon?

Question

I am trying to parse the following line using sed to replace a whitespace with a newline only when the whitespace doesn't precede a colon.

For example, I'm using the following input to be processed:

label1: output label2: output2 label3: "output3" label4: output4 { label5: output5 label6: output6 } label7: output7 { { { label8: output8 } label9: output9 } } label10: output10

I'd like regex to replace any whitespace that doesn't have a colon before it with a newline, so the output would be something like this:

label1: output
label2: output2
label3: "output3"
label4: output4
{
label5: output5
label6: output6
}
label7: output7
{
{
{
label8: output8
label9: output9
}
}
label10: output10

When I try using the following regex in cat file | sed 's/[^:A-Za-z0-9\"] /%/g' | tr '%' '\n', it results in the output below, which is close but not achieving the goal:

    label1: output label2: output2 label3: "output3" label4: output4
    label5: output5 label6: output6
    label7: output7


    label8: output8
    label9: output9

    label10: output10

I've also tried this cat file | sed 's/[^:A-Za-z0-9\"] /%/g' | tr '%' '\n', and it results in

label1: outpu
label2: output
label3: "output3
label4: output

label5: output
label6: output

label7: output



label8: output

label9: output


label10: output10

Which looks like the regex also includes replacing every other character that's not a : with a newline.

You want to avoid the [useless use of `cat`](https://stackoverflow.com/questions/11710552/useless-use-of-cat) — tripleee, Nov 16 '18 at 06:04
The regular expression for "not a colon (or a newline)" is `[^:]`; it's not clear from your question if you also want newlines followed by space to be replaced. — tripleee, Nov 16 '18 at 06:06

Amol · Answer 1 · 2018-11-16T06:21:17.770

0

This should do it:

sed -E 's/([^:]) /\1\n/g' file

Output:

label1: output
label2: output2
label3: "output3"
label4: output4
{
label5: output5
label6: output6
}
label7: output7
{
{
{
label8: output8
}
label9: output9
}
}
label10: output10

Cheers!

edited Nov 16 '18 at 06:21

answered Nov 16 '18 at 05:52

Amol

1,084
10
20

Welcome to StackOverflow! Your answer need more explanation about how the code works to be a good answer. – Tân Nov 16 '18 at 05:56
The `-r` option is not portable; not all `sed` variants support the escape `\n` for literal newline. – tripleee Nov 16 '18 at 06:08
Changed the -r option to -E. Not sure how to address your comment about \n – Amol Nov 16 '18 at 06:21
`-E` is also not portable; both of these are outside of the POSIX spec. There is a way to embed a literal newline in a `sed` script but how exactly to do that isn't completely portable, either (typically, backslash-newline works most places). – tripleee Nov 16 '18 at 06:33

score 0 · Answer 2 · answered Nov 16 '18 at 08:32

This might work for you (GNU sed):

sed 'G;:a;s/\([^: ]\) \(.*\(.\)\)/\1\3\2/;ta;s/.$//' file

Append a newline to the current line using the G command which by default appends an empty hold space to the pattern space. Using pattern matching and back references, iterate throughout the current line replacing a non-space/non-colon character followed by a space, by the appended newline. When there are no further matches, remove the newline artefact and print the line.

The same solution can be viewed more easily using the -r option (GNU sed only) which removes many of back slashes:

sed  -r 'G;:a;s/([^: ]) (.*(.))/\1\3\2/;ta;s/.$//' file

Also as pointed out, the optimal solution would be:

sed  's/\([^: ]\) /\1\n/g' file

Replacing a specific whitespace pattern in sed with a newline, when it doesn't have a preceding colon?

2 Answers2