0

I have successfully written following function:

function print0(){
  stdin=$(cat);
  echo "$stdin" | awk 'BEGIN {ORS="\000";}; { print $0}';
}

which works as a -print0 argument in find command, but basically for any command that passes it's output to this function. It is useful with xargs -0. Then I realized that also opposite of this function would be useful too. I have tried following:

function read0(){
  stdin=$(cat);
  echo "$stdin" | awk 'BEGIN {RS="\000"; ORS="\n";};  {print $0}';

  # EQUIVALENTS:
  # echo "$stdin" | perl -nle '@a=join("\n", split(/\000/, $_)); print "@a"'
  # echo "$stdin" | perl -nle '$\="\n"; @a=split(/\000/, $_); foreach (@a){print $_;}'
}

But it does not works, the interesting is that when I tried just commands (awk or perl) it worked like a charm:

# WORKING
ls | print0 | awk 'BEGIN {RS="\000"; ORS="\n";};  {print $0}'
ls | print0 | perl -nle '@a=join("\n", split(/\000/, $_)); print "@a"'
ls | print0 | perl -nle '$\="\n"; @a=split(/\000/, $_); foreach (@a){print $_;}'


# DOES NOT WORKING
ls | print0 | read0

What I am doing wrong? I am assuming that something is wrong with dealing null characters via following command: stdin=$(cat);

EDIT: Thank you all, the conclusion is that bash variables cannot hold null value. PS: mentioned command was just as example I know converting nulls to newlines and vice versa has not rational reason.

Wakan Tanka
  • 7,542
  • 16
  • 69
  • 122
  • +1 Interesting. Can't help you though. –  Dec 10 '13 at 19:43
  • The shell uses C strings internally, and C uses `\0` as the string terminator. So you can't have that as a character in a string, it will just end it. – Barmar Dec 10 '13 at 19:46

2 Answers2

3

I would say that your implementation can be simplified as

function print0 { tr '\n' '\0'; }
function read0  { tr '\0' '\n'; }

which works as you want.

But, it adds no value; you just switch from new-line separated records to NUL separated records and vice-versa, while find ... -print0 can handle multi-line filenames. Your idea doesn't solve that problem.

The practical view of your question - how can strings with embedded NUL characters be handled in bash - has been discussed on SO: assign string containing null-character (\0) to a variable in bash. The bottom line is, you have to escape them. Other than that, zsh supports embedded NUL characters, but apparently no other shell does.

There has been a related discussion on bug-bash about the handling of NUL characters by the read shell builtin, which you may find interesting.

Community
  • 1
  • 1
user2719058
  • 2,183
  • 11
  • 13
1

As the other answers/comments mention, you can't put a null character in a bash string variable. However if you can get rid of the variables and just handle the data in pipes/streams, then you can pass null characters through just fine:

function print0() {
  awk 'BEGIN {ORS="\000";}; {print $0}';
}

function read0() {
  awk 'BEGIN {RS="\000"; ORS="\n";};  {print $0}';
}
ubuntu@ubuntu:~/dir$ ls -1
file one
file_two
ubuntu@ubuntu:~/dir$ ls | print0 | read0
file one
file_two
ubuntu@ubuntu:~/dir$ 

Also using ls in this way is dangerous, because it won't work for filenames that contain newlines. As far as I'm aware, find is the way to programmatically get a list of files in a directory, when odd characters appear in filenames.


Update:

Here's another way to programmatically get a list of files in a directory, when odd characters appear in filenames, without using find (or the flawed ls). We can use a * glob to get the list of all files in the directory into a bash array. Then we print out each member of the array, using 1 character of /dev/zero as a delimiter:

#!/bin/bash

shopt -s nullglob
shopt -s dotglob    # display .files as well

dirarray=( * )

for ((i = 0 ; i < ${#dirarray[@]}; i++)); do
    [ "$i" != "0" ] && head -c1 /dev/zero
    printf "${dirarray[$i]}"
done
Digital Trauma
  • 15,475
  • 3
  • 51
  • 83
  • I think "\0" and "\000" ought to be identical on any awk -- at least, they are on the three awks I happen to have access to -- but setting `ORS` to `NUL`, regardless of how you do it, doesn't actually "work" on either `mawk` or the current version of the original awk code. What's `awk` on your system? – rici Dec 11 '13 at 03:08
  • @rici - you're right - "\0" and "\000" seem to be equivalent now - I must have been doing something wrong. `awk` and `gawk` on my Ubuntu 12.04 are both "GNU Awk 3.1.8". – Digital Trauma Dec 11 '13 at 03:39
  • 1
    The array thing can be simplified to `printf "%s\0" "${dirarray[@]}"`. This prints a trailing null, but I think that's what you usually want. If all you want is simulate `find ... -print0`, you can do without the array and just say `printf "%s\0" *`. – user2719058 Dec 16 '13 at 00:51
  • @user2719058 - yes - very elegant! – Digital Trauma Dec 16 '13 at 01:05