67

I want to iterate through a list of files without caring about what characters the filenames might contain, so I use a list delimited by null characters. The code will explain things better.

# Set IFS to the null character to hopefully change the for..in
# delimiter from the space character (sadly does not appear to work).
IFS=$'\0'

# Get null delimited list of files
filelist="`find /some/path -type f -print0`"

# Iterate through list of files
for file in $filelist ; do
    # Arbitrary operations on $file here
done

The following code works when reading from a file, but I need to read from a variable containing text.

while read -d $'\0' line ; do
    # Code here
done < /path/to/inputfile
codeforester
  • 39,467
  • 16
  • 112
  • 140
Matthew
  • 6,351
  • 8
  • 40
  • 53
  • 8
    I don't think it's possible to store null characters in a bash variable. At least, I've never found a way to do it... – Gordon Davisson Dec 30 '11 at 18:25
  • 1
    Confirmed, `bash: warning: command substitution: ignored null byte in input`. This is because bash is intended for posix derivative environments, in which env vars are internally stored in a null-terminated buffer, and bash vars are (in every case I've ever examined) host env vars. – memtha Apr 16 '20 at 15:58
  • You may be able to store a null char in a bash variable, but you can not get it out, so there is no way to tell. First example prove that assigning non-displayable chars works (as we all know), e.g. a tab in octal : `test=$'a\011b';echo ${#test} ="${test}"=` results in `3 =a b=`. Then try an octal 0 : `test=$'a\0b';echo ${#test} ="${test}"=` results in `1 =a=`; this reports the zero-terminated string length of $test as 1, but the 'b' and another zero may still be stored into the variable, we do not know. – db-inf Oct 15 '22 at 10:59

5 Answers5

115

The preferred way to do this is using process substitution

while IFS= read -r -d $'\0' file; do
    # Arbitrary operations on "$file" here
done < <(find /some/path -type f -print0)

If you were hell-bent on parsing a bash variable in a similar manner, you can do so as long as the list is not NUL-terminated.

Here is an example of bash var holding a tab-delimited string

$ var=$(echo -ne "foo\tbar\tbaz\t"); 
$ while IFS= read -r -d $'\t' line ; do \
    echo "#$line#"; \
  done <<<"$var"
#foo#
#bar#
#baz#
SiegeX
  • 135,741
  • 24
  • 144
  • 154
  • Excellent, exactly what I was looking for. Thanks! I ended up using your second example. – Matthew Dec 30 '11 at 17:30
  • 5
    What is the use of the IFS since the -d flag is set? – thisirs Mar 06 '13 at 15:02
  • 15
    @thisirs By setting `IFS` to the null string, leading and trailing whitespace characters will be preserved. – toxalot Mar 10 '14 at 04:48
  • Will "inlining" the `IFS` declaration behave differently than if it was in a separate line? Specifically, will it "scope it", so that after the `read` command, `IFS` will be set back to whatever it was? – Camilo Martin Feb 24 '15 at 03:45
  • 1
    @CamiloMartin exactly. If there is no `;` after the variable assignment then its value will only apply to the command it prefixes, namely `read`. You can prove this to yourself by running something like this `IFS=$'\t'; while IFS= read -r -d '' file; do od -c <<<"$IFS"; done < <(echo -e 'foo\0')` and note that IFS is still set to a tab inside the loop. – SiegeX Feb 25 '15 at 16:39
  • 1
    Does this actually work when using the here string syntax (`<<<`) as in the first example? In my tests only the process substitution works (`< <(...)`): `while IFS= read -r -d '' f; do echo "$f"; done <<< "$(find Documents -print0)"` produces no output while `while IFS= read -d '' f; do echo "$f"; done < <(find Documents -print0)` lists the files in Documents. – joanpau Aug 03 '16 at 10:51
  • 4
    Fully agree with @joanpau. I don't know how stuff worked in 2011, but in 2016 with bash4 this method does not work. You can easily verify that this fails it if you assign `var=$(find . -print0)` prior to while loop. Process substitution works indeed, but variables not. Even if you built a variable on the fly like `var=$(echo -e "some\0text")` will fail to separate `some` from `text`. For variables you need to make a trick like this:http://stackoverflow.com/questions/6570531/assign-string-containing-null-character-0-to-a-variable-in-bash – George Vasiliou Feb 22 '17 at 15:17
  • 3
    This is not setting IFS to the null string. It is [unsetting IFS, causing it to use the default values ($' \t\n')](https://mywiki.wooledge.org/IFS). – jeremysprofile Nov 25 '18 at 16:53
  • Even if you could get nulls into an env var, you would still be limited to the maximum size of env vars (see xarg docs). By using a pipe or other redirection, not only can you move an unlimited(*) amount of data, but the two procedures can run in parallel; each process will wait for the others as needed. *"unlimited" irrespective of time of operation and other environmental conditions. – memtha Apr 16 '20 at 16:14
  • 6
    @jeremysprofile actually, `IFS=` is *disabling* word-splitting. It is *not* setting IFS back to its default value like `unset IFS` does. – SiegeX May 06 '20 at 02:21
  • `echo -en ' x \0' | while read -r -d ''; do echo "<$REPLY>"; done` prints `< x >` in bash3 and bash5 — demonstrating an undocumented `read -d` feature. – Devon Jun 29 '20 at 01:05
5

Pipe them to xargs -0:

files="$( find ./ -iname 'file*' -print0 | xargs -0 )"

xargs manual:

-0, --null
    Input items are terminated by a null character instead of
    by whitespace, and the quotes and backslash are not
    special (every character is taken literally).
Victor Sergienko
  • 13,115
  • 3
  • 57
  • 91
Maltigo
  • 51
  • 1
  • 2
  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Sep 29 '21 at 13:49
2

Use env -0 to output the assignments by the zero byte.

env -0 | while IFS='' read -d '' line ; do
    var=${line%%=*}
    value=${line#*=}
    echo "Variable '$var' has the value '$value'"
done
choroba
  • 231,213
  • 25
  • 204
  • 289
1

In terms of readability and maintainability a bash function might be cleaner:

An example that converts MOV files to MP4 using ffmpeg (works with files containing spaces and special characters):

#!/usr/bin/env bash

do_convert () { 
  new_file="${1/.mov/.mp4}"
  ffmpeg -i "$1" "$new_file" && rm "$1" 
}

export -f do_convert  # needed to make the function visible inside xargs

find . -iname '*.mov' -print0 | xargs -0 -I {} bash -c 'do_convert "{}"' _ {}

Does not apply to the OP's question but in case your input is generated by find then there is no need to pipe via xargs -0 as find is perfectly capable of handling non-ascii characters and spaces in file names. If you don't care about readability and maintainability then the command above can be simplified to:

find . -type f -iname "*.mov" -exec bash -c 'ffmpeg -i "${1}" "${1%.*}.mp4" && rm "${1}"' _ {} \;
ccpizza
  • 28,968
  • 18
  • 162
  • 169
  • As noted in this answer, it doesn't handle arbitrary strings with null characters. However, it does answer a different, common question for performing actions on files without using the "-print0 | xargs -0" form, and without using IFS= overrides. – Groboclown Apr 12 '23 at 22:14
-6

I tried working with the bash examples above, and finally gave up, and used Python, which worked the first time. For me it turned out the problem was simpler outside the shell. I know this is possibly off topic of a bash solution, but I'm posting it here anyway in case others want an alternative.

import sh
import path
files = path.Path(".").files()
for x in files:
    sh.cp("--reflink=always", x, "UUU00::%s"%(x.basename(),))
    sh.cp("--reflink=always", x, "UUU01::%s"%(x.basename(),))
Henry Crutcher
  • 2,137
  • 20
  • 28