8

Problem: replace some regex with \n with sed.

Solution: there are many similar answers [1][2][3][4], and many other links that I won't link. All of them suggest you to create a new label :a, merge lines N, branch to :a if not end-of-file $!ba, and then do some command.

That said... In the GNU sed manual, there is the -z option:

-z
--null-data
--zero-terminated

Treat the input as a set of lines, each terminated by a zero byte
(the ASCII ‘NUL’ character) instead of a newline. This option can
be used with commands like ‘sort -z’ and ‘find -print0’ to process
arbitrary file names. 

So, first, for comparison reasons, if we try the naive approach:

$ seq 3 | sed 's/\n/ /g'
1
2
3

However, using this -z option:

$ seq 3 | sed -z 's/\n/ /g'
1 2 3

The Real Question: Why?

Given that it "merges" all the lines, as specified in the documentation, I expected that I would have to use \0 instead of \n, since:

Treat the input as a set of lines, each terminated by a zero byte (the ASCII ‘NUL’ character)

Since I didn't find any post related to it, I think I might be misunderstanding something here... So, what does it really do? Why does it work?

yZaph
  • 169
  • 1
  • 10

1 Answers1

9

Using -z changes what sed considers to be a line. \n remains \n, but it doesn't end a line, but the null character (which is represented as \x0 in Sed) would. As there are no null bytes in the output of seq, the whole output is considered one line and processed in single iteration (i.e. replacing all \n's by spaces).

Enlico
  • 23,259
  • 6
  • 48
  • 102
choroba
  • 231,213
  • 25
  • 204
  • 289
  • I didn't quite understand your answer regarding the `NUL` character. It doesn't specify that the input must have this character. What I understand from that is that it will put a `\0` character in the end of each string, while treating every line as on thing. So, is the documentation wrong, or just my understanding? – yZaph Sep 27 '18 at 13:51
  • Your understanding is wrong. `sort -z` and `find -print0` add the `\0` character, `sed -z` just treats `\0` as the end-of-line marker instead of the default `\n`. – choroba Sep 27 '18 at 13:57
  • 2
    @yZaph Sed just sees a byte stream, and by default, the `\n` byte is considered to mark the end of a line. With `-z`, sed looks for `\0` bytes to denote line ends instead, and `\n` is treated like any other character. I think you assume that it will takes newline delimited strings and replace the `\n` with `\0`, but that's not the case. – Benjamin W. Sep 27 '18 at 13:58
  • Oh. I see. I understand it now. Thank you, guys. – yZaph Sep 27 '18 at 14:02
  • Coming to understand this `-z` option in sed. This is a good explanation – Kalib Zen Sep 28 '21 at 23:21