3

I want to create a bash array from a NUL separated input (from stdin).

Here's an example:

## Let define this for clarity
$ hd() { hexdump -v -e '/1 "%02X "'; echo ;}
$ echo -en "A B\0C\nD\0E\0" | hd
41 20 42 00 43 0A 44 00 45 00

So this is my input.

Now, working with NUL works fine if not using the -a of read command:

$ while read -r -d '' v; do echo -n "$v" | hd; done < <(echo -en "A B\0C\nD\0E\0")
41 20 42 
43 0A 44 
45 

We get the correct values. But I can't store these values using -a:

$ read -r -d '' -a arr < <(echo -en "A B\0C\nD\0E\0")
$ declare -p arr
declare -a arr='([0]="A" [1]="B")'

Which is obviously not what I wanted. I would like to have:

$ declare -p arr
declare -a arr='([0]="A B" [1]="C
D" [2]="E")'

Is there a way to go with read -a, and if it doesn't work, why? Do you know a simple way to do this (avoiding the while loop) ?

eYe
  • 1,695
  • 2
  • 29
  • 54
vaab
  • 9,685
  • 7
  • 55
  • 60
  • Why avoid the while loop? The while loop is the FAQ-approved, irc.freenode.org/#bash-blessed Right Way to do this. – Charles Duffy May 05 '14 at 13:58
  • 1
    ...mind you, I'd very much prefer that `readarray` or `mapfile` supported NUL delimiters, but as of Bash 4.3, they don't. Perhaps someone should ask Chet if a patch would be accepted... – Charles Duffy May 05 '14 at 13:59
  • I'm using a ``while`` loop. I was just wondering why this didn't work, and wan't to make sure I wasn't missing something obvious. Any details (bug report, source code link, OS limitations, sourced acknowledgement of this lack) that would give more information about the 'why' ? – vaab May 05 '14 at 14:08
  • 1
    `-d` provides the delimiter used by `read -a` to tell it **when to stop reading entirely, not when to stop reading a single entry**. Does that make behavior more clear? – Charles Duffy May 05 '14 at 14:14

4 Answers4

7

read -a is the wrong tool for the job, as you've noticed; it only supports non-NUL delimiters. The appropriate technique is given in BashFAQ #1:

arr=()
while IFS= read -r -d '' entry; do
  arr+=( "$entry" )
done

In terms of why read -d '' -a is the wrong tool: -d gives read an argument to use to determine when to stop reading entirely, rather than when to stop reading a single element.

Consider:

while IFS=$'\t' read -d $'\n' words; do
  ...
done

...this will read words separated by tab characters, until it reaches a newline. Thus, even with read -a, using -d '' will read until it reaches a NUL.

What you want, to read until no more content is available and split by NULs, is not a '-d' of NUL, but no end-of-line character at all (and an empty IFS). This is not something read's usage currently makes available.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • You probably wanted to point to [BashFAQ #5](http://mywiki.wooledge.org/BashFAQ/005). As the #1 doesn't speak about arrays. – vaab May 05 '14 at 14:13
  • @vaab, #1 speaks *directly* to reading NUL-delimited input. Look for the example describing correct use with `find -print0`. – Charles Duffy May 05 '14 at 14:15
  • There's only one mention of the word 'array' in #1, and it tells to go to #5. I feel #5 answers my concerns, not #1. I definitively know how to read NUL separated content with ``read`` as shown in the question itself. – vaab May 05 '14 at 14:22
  • @vaab, it doesn't say "array", but it does speak to NUL-delimited text. Search for `-print0`. – Charles Duffy May 05 '14 at 14:23
3

bash-4.4-alpha added a -d option to mapfile:

The `mapfile' builtin now has a -d option to use an arbitrary character as the record delimiter, and a -t option to strip the delimiter as supplied with -d.

https://tiswww.case.edu/php/chet/bash/CHANGES

Using this, we can simply write:

mapfile -t -d '' arr < <(echo -en "A B\0C\nD\0E\0")
Robin A. Meade
  • 1,946
  • 18
  • 17
  • Useful addendum. That said, I'd suggest `printf '%s\0' "A B" C D E` in place of the `echo`, btw -- even on bash, `echo -e` isn't always available (for example, it just prints `-e` on output whenever both `xpg_echo` and `posix` flags are active -- and the former can be made default at compile time). – Charles Duffy Nov 10 '17 at 17:14
0

If anyone wonders, here's the function (using while) that I use to store values from a NUL-separated stdin:

read_array () {
    local i
    var="$1"
    i=0
    while read -r -d '' value; do
        printf -v "$var[$i]" "%s" "$value"
        i=$[$i + 1]
    done
}

It can then be used quite cleanly:

$ read_array arr < <(echo -en "A B\0C\nD\0E\0")
$ declare -p arr
declare -a arr='([0]="A B" [1]="C
D" [2]="E")'
vaab
  • 9,685
  • 7
  • 55
  • 60
  • 1
    The [bracket form of arithmetic expansion is deprecated](http://stackoverflow.com/a/2415777/1157557). You could remove that line completely and increment `i` on the previous line: `"$var[i++]"`. – Robin A. Meade Jan 15 '17 at 08:27
0

Here's a simplification of @vaab's function. It uses bash 4.3's nameref feature:

read_array () {
  local -n a=$1
  while read -r -d '' value; do
    a+=("$value")
  done
}

Test:

test_it () {
  local -a arr
  read_array arr < <(echo -en "A B\0C\nD\0E\0")
  declare -p arr
}
test_it
Robin A. Meade
  • 1,946
  • 18
  • 17