0

The answer to this question on splitting strings by newline characters, Split bash string by newline characters, seems to say that newlines are the default delimiter, so we should change the delimiter to null, and split on that instead. Why doesn't splitting on the newline work? What I would expect (and desire, in my use case) is that there be a 1:1 correlation between lines and \n in the input string (so a \n must be added to get the last line), and that blank lines, leading/imbedded whitespace, etc. would be preserved.

Quoting from Mark Gerolimatos, who seems to be asking the same question:

In OS-X/Macland, you have to use bash 3.2 (or at least without updating BASH). Thus the mysterious read -rd ' ' must be used (and works!) the online manual page I found is pretty cryptic about this (ss64.com/bash/read.html)...it's pretty mind-bending...does it mean "turn off \n, and then use emptiness as the delimiter?"

Comissar
  • 53
  • 1
  • 6
  • 1
    The default delimiter is in the value of `IFS` and it is not just `\n`, see the output of `printf '%q\n' "$IFS"`, it is a space, tab and newline, and the `builtin` `read` strips trailing and leading white spaces/tabs by default, hence the use if `IFS=` to disable that feature What is it you're trying to do? – Jetchisel Jul 02 '20 at 00:46
  • Show some code that isn't working. We can't help otherwise. Good luck. – shellter Jul 02 '20 at 00:47
  • The code works; the question was 'why'? – Comissar Jul 04 '20 at 02:49
  • My confusion was on the two uses of 'delimiter', and on all the 'helpful' things bash does that are not obvious - splitting without explicitly asking, the implicit newline on <<<, discarded repeat separators, etc. I upvoted both answers, because the two together gave me the context that I needed. Thanks! – Comissar Jul 04 '20 at 02:58

2 Answers2

5

The confusion happens because read operates with two delimiters:

  1. How much to read
  2. How to split what you read

By default, this is:

  1. Read until a linefeed (i.e. one line)
  2. Split on whitespace (i.e. into words)

If you just set IFS=$'\n' you can see the problem:

  1. Read until a linefeed (again, one line)
  2. Split on linefeed (which doesn't do anything, because one line necessarily can't consist of multiple lines)

What you instead want to do is

  1. read all input
  2. split on linfeed

read -d '' causes read to read until an ASCII NUL, which is not found in normal text, and is therefore a workable proxy for "read all text input".

that other guy
  • 116,971
  • 11
  • 170
  • 194
2

Just to make sure we're on the same page, this is the code in that answer:

IFS=$'\n' read -rd '' -a y <<<"$x"

where x is the variable to read from and y is the array variable to populate with the lines of x.

Why doesn't splitting on the newline work?

It does; the IFS=$'\n' is telling read to split on newlines.

If you're asking why you can't write read -rd $'\n' -a y, then: the delimiter indicated by -d tells read where to stop reading. So if you set that to a newline, then read will only read one line!

What I would […] desire […] is that […] blank lines […] would be preserved.

Yes, it's annoying that initial or consecutive occurrences of the separator get discarded, such that x=$'\na\n\nb' gives the same result as x=$'a\nb'.

To satisfy your requirements, you'll need to use a slightly different approach, where you call read once per line:

y=()
while IFS= read -r -d $'\n' ; do
  y+=("$REPLY")
done <<< "${x%$'\n'*}"

In this approach, we tell read to just take the line as-is and not split it (hence IFS=), and we handle the looping ourselves.

Note that the "${x%$'\n'*}" bit strips off the last newline and everything after it, per your requirement to ignore the last line if it doesn't have a newline. (The <<< bit implicitly adds a newline.)

ruakh
  • 175,680
  • 26
  • 273
  • 307