38

While trying to process a list of file-/foldernames correctly (see my other questions) through the use of a NULL-character as a delimiter I stumbled over a strange behaviour of Bash that I don't understand:

When assigning a string containing one or more NULL-character to a variable, the NULL-characters are lost / ignored / not stored.

For example,

echo -ne "n\0m\0k" | od -c   # -> 0000000   n  \0   m  \0   k

But:

VAR1=`echo -ne "n\0m\0k"`
echo -ne "$VAR1" | od -c   # -> 0000000   n   m   k

This means that I would need to write that string to a file (for example, in /tmp) and read it back from there if piping directly is not desired or feasible.

When executing these scripts in Z shell (zsh) the strings containing \0 are preserved in both cases, but sadly I can't assume that zsh is present in the systems running my script while Bash should be.

How can strings containing \0 chars be stored or handled efficiently without losing any (meta-) characters?

Community
  • 1
  • 1
antiplex
  • 938
  • 2
  • 12
  • 17
  • 1
    In case anybody's wondering why you can't store `\0`, it is the character used to delimit the end of a variable. So storing the NUL character is essentially the same as setting a variable to empty – Nick Bull Mar 10 '20 at 14:44

5 Answers5

40

In Bash, you can't store the NULL-character in a variable.

You may, however, store a plain hex dump of the data (and later reverse this operation again) by using the xxd command.

VAR1=`echo -ne "n\0m\0k" | xxd -p | tr -d '\n'`
echo -ne "$VAR1" | xxd -r -p | od -c   # -> 0000000    n  \0   m  \0   k
jeff
  • 416
  • 5
  • 2
  • 1
    Nice one ) I was using VAR1="$(echo -ne 'n\0m\0k' | sed 's/\\/\\\\/g;s/\x0/\\0/g' )"; echo -ne "$VAR1" | od -c # -> 0000000 n \0 m \0 k Btw, you should not use -e on echo in outputting in your example and my example "may" be using less memory, but I think it is irrelevant. – XzKto Jul 04 '11 at 13:10
  • hanks for that nice answer! why not use -e for echo? if -E (default) is active, \0 is interpreted as 2 characters ('\' and '0'), so i guess using -e should be fine (since \0 is just an escape for the NUL-char)? i agree that -e probably won't work in @XzKto 's solution... thanks anyway for this second approach! – antiplex Jul 04 '11 at 20:29
  • I meant that @jeff should not use -e in his example (in second echo), as it is completely useless but I was just nitpicking ) – XzKto Jul 05 '11 at 06:52
  • 1
    @jeff weird that i didn't recognize this earlier but i guess appending `tr -d '\n'` is not really necessary since `xdd -r -p` seems to slurp/remove the automatically added newlines, at least it does for me using version 1.10 in bash v4.2.24 and zsh v4.3.17 pls correct me if i have overlooked something here ;) – antiplex Apr 03 '13 at 11:00
  • printf might be better: let's say I want to handle a null delimited username and password. Furthermore, say the username and password are `user` and `new\nlines`, respectively. You'll fine that using `echo -en "$username\0$password"` interprets `\n` as a new line rather than part of a string. What works better is `printf "%s\0%s" "$username" "$password"` or in the case of this question, `VAR1=$(printf "%s\0%s\0%s" n m k | xxd -p | tr -d '\n')` – b_laoshi Jun 20 '19 at 08:41
20

As others have already stated, you can't store/use NUL char:

  • in a variable
  • in an argument of the command line.

However, you can handle any binary data (including NUL char):

  • in pipes
  • in files

So to answer your last question:

can anybody give me a hint how strings containing \0 chars can be stored or handled efficiently without losing any (meta-) characters?

You can use files or pipes to store and handle efficiently any string with any meta-characters.

If you plan to handle data, you should note additionally that:

  • Only the NUL char will be eaten by variable and argument of the command line, you can check this.
  • Be wary that command substitution (as $(command..) or `command..`) has an additional twist above being a variable as it'll eat your ending new lines.

Bypassing limitations

If you want to use variables, then you must get rid of the NUL char by encoding it, and various other solutions here give clever ways to do that (an obvious way is to use for example base64 encoding/decoding).

If you are concerned by memory or speed, you'll probably want to use a minimal parser and only quote NUL character (and the quoting char). In this case this would help you:

quote() { sed 's/\\/\\\\/g;s/\x0/\\x00/g'; }

Then, you can secure your data before storing them in variables and command line argument by piping your sensitive data into quote, which will output a safe data stream without NUL chars. You can get back the original string (with NUL chars) by using echo -en "$var_quoted" which will send the correct string on the standard output.

Example:

## Our example output generator, with NUL chars
ascii_table() { echo -en "$(echo '\'0{0..3}{0..7}{0..7} | tr -d " ")"; }
## store
myvar_quoted=$(ascii_table | quote)
## use
echo -en "$myvar_quoted"

Note: use | hd to get a clean view of your data in hexadecimal and check that you didn't loose any NUL chars.

Changing tools

Remember you can go pretty far with pipes without using variables nor argument in command line, don't forget for instance the <(command ...) construct that will create a named pipe (sort of a temporary file).

EDIT: the first implementation of quote was incorrect and would not deal correctly with \ special characters interpreted by echo -en. Thanks @xhienne for spotting that.

EDIT2: the second implementation of quote had bug because of using only \0 than would actually eat up more zeroes as \0, \00, \000 and \0000 are equivalent. So \0 was replaced by \x00. Thanks for @MatthijsSteen for spotting this one.

vaab
  • 9,685
  • 7
  • 55
  • 60
  • 1
    Interesting answer but the `quote` function seems wrong to me. It properly replace `\0` characters with `\0` but fails to escape all the escape sequences in the original stream which will end up being interpreted by the subsequent `echo -en` command. – xhienne Feb 12 '18 at 15:32
  • @xhienne spot on ! Thanks for your remark I corrected the implementation of ``quote``. – vaab Feb 13 '18 at 01:15
  • 1
    While using this, I encountered a bug in the quote function. Instead of using `\0`, which will cause it to eat up to 3 zeroes after it (`echo -en '\00000' ~> 0`), either `\0000` or `\x00` should be used. So it would become e.g.: `quote() { sed 's/\\/\\\\/g;s/\x0/\\x00/g'; }` – Matthijs Steen Feb 23 '19 at 23:09
12

Use uuencode and uudecode for POSIX portability

xxd and base64 are not POSIX 7 but uuencode is.

VAR="$(uuencode -m <(printf "a\0\n") /dev/stdout)"
uudecode -o /dev/stdout <(printf "$VAR") | od -tx1

Output:

0000000 61 00 0a
0000003

Unfortunately I don't see a POSIX 7 alternative for the Bash process <() substitution extension except writing to file, and they are not installed in Ubuntu 12.04 by default (sharutils package).

So I guess that the real answer is: don't use Bash for this, use Python or some other saner interpreted language.

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
4

I love jeff's answer. I would use Base64 encoding instead of xxd. It saves a little space and would be (I think) more recognizable as to what is intended.

VAR=$(echo -ne "foo\0bar" | base64)
echo -n "$VAR" | base64 -d | xargs -0 ...

As for -e, it is needed for the echo of a literal string with an encoded null ('\0'), though I also seem to recall something about "echo -e" being unsafe if you're echoing any user input as they could inject escape sequences that echo will interpret and end up with bad things. The -e flag is not needed when echoing the encoded stored string into the decode.

vontrapp
  • 649
  • 6
  • 6
  • 1
    `-e` is needed. `echo -n 'a\0b' | xxd -p` gives the hexadecimal `615c3062`, which stands for 4 bytes, not 3. Compare with `echo -ne 'a\0b' | xxd -p` which produces `610062`. (same results with double-quotes) – Martin Jambon Jul 24 '18 at 00:24
  • that's because of the single quotes. -e is telling echo to take the literal string passed to it, and evaluate it for escapes (and other things?). in the case of single quotes echo does get the escape sequence and then interprets that into a null. If instead you use double quotes then the \0 is translated by the shell into a null character before echo gets it. – vontrapp Apr 02 '19 at 02:22
  • 1
    `'\0'` and `"\0"` are equivalent (in both posix and bash), and they represent a string of two bytes. `echo "\0"` will print `\0` just like `echo '\0'`. – Martin Jambon Apr 09 '19 at 22:01
  • 1
    I stand humbly corrected. Yes, 'a\0b' and "a\0b" and even a\\0b all need the -e flag for echo to output a null char. I'm updating the answer. I do stand by the -e flag being harmful if you have any user input. And at any rate this is *probably* going to be used to capture the output of something besides echo that generates null chars (e.g. find -print0) and there's no need to 'echo' that with -e, nor is there a need to use -e when echoing the stored encoding back into the decode step. An echo -e of a literal string in the code is fine (and necessary, for the example). – vontrapp May 22 '19 at 06:46
0

Here’s a maximally memory-efficient solution, that just escapes the NULL bytes with an \xFF.
(Since I wasn’t happy with base64 or the like. :)

esc0() { sed 's/\xFF/\xFF\xFF/g; s/\x00/\xFF0/g'; }
cse0() { sed 's/\xFF0/\xFF\x00/g; s/\xFF\(.\)/\1/g'; }

It of course escapes any actual \xFF by doubling it too, so it works exactly like when backslashes are used for escaping. This is also why a simple mapping can’t be used, and referring to the match in the replacement is required.

Here’s an example that paints gradients onto the framebuffer (doesn’t work in X), using variables to pre-render blocks and lines for speed:

width=7680; height=1080; # Set these to your framebuffer’s size.
blocksPerLine=$(( $width / 256 ))
block="$( for i in 0 1 2 3 4 5 6 7 8 9 A B C D E F; do for j in 0 1 2 3 4 5 6 7 8 9 A B C D E F; do echo -ne "\x$i$j"; done; done | esc0 )"
line="$( for ((b=0; b < blocksPerLine; b++)); do echo -en "$block"; done )"
for ((l=0; l <= $height; l++)); do echo -en "$line"; done | cse0 > /dev/fb0

Note how $block contains escaped NULLs (plus \xFFs), and at the end, before writing everything to the framebuffer, cse0 unescapes them.

Evi1M4chine
  • 6,992
  • 1
  • 24
  • 18