221

I want to iterate over a list of files. This list is the result of a find command, so I came up with:

getlist() {
  for f in $(find . -iname "foo*")
  do
    echo "File found: $f"
    # do something useful
  done
}

It's fine except if a file has spaces in its name:

$ ls
foo_bar_baz.txt
foo bar baz.txt

$ getlist
File found: foo_bar_baz.txt
File found: foo
File found: bar
File found: baz.txt

What can I do to avoid the split on spaces?

codeforester
  • 39,467
  • 16
  • 112
  • 140
gregseth
  • 12,952
  • 15
  • 63
  • 96
  • This is basically a specific subcase of [When to wrap quotes around a shell variable?](https://stackoverflow.com/questions/10067266/when-to-wrap-quotes-around-a-shell-variable) – tripleee Jun 06 '20 at 10:53

12 Answers12

275

You could replace the word-based iteration with a line-based one:

find . -iname "foo*" | while read f
do
    # ... loop body
done
martin clayton
  • 76,436
  • 32
  • 213
  • 198
  • 34
    This is extremely clean. And makes me feel nicer than changing IFS in conjunction with a for loop – Derrick Aug 18 '11 at 04:13
  • 15
    This will split a single file path that contains a \n. OK, those shouldn’t be around but they can be created: `touch "$(printf "foo\nbar")"` – Ollie Saunders Oct 17 '13 at 05:14
  • 6
    To prevent any interpretation of the input (backslashes, leading and trailing whitespace), use `IFS= while read -r f` instead. – mklement0 Apr 02 '16 at 17:26
  • 3
    This [answer](http://stackoverflow.com/a/21663203/1116842) shows a more secure combination of `find` and a while loop. – moi Aug 13 '16 at 10:40
  • 5
    Seems like pointing out the obvious, but in nearly all simple cases, `-exec` is going to be cleaner than an explicit loop: `find . -iname "foo*" -exec echo "File found: {}" \;`. Plus, in many cases you can replace that last `\;` with`+` to put lots of files in the one command. – naught101 Sep 27 '16 at 00:22
  • 1
    @naught101, indeed. In that case, `-exec printf 'File found: %s\n' {} +` would be the alternative printing `File found` for each individual file, and and being compatible with `-exec ... {} +`. – Charles Duffy Nov 22 '16 at 21:06
  • 2
    Even after several years, this answer has some major bugs. (1) - `read f` trims trailing whitespace from filenames before assigning them to `f`; to avoid this, it should be `IFS= read f`. (2) - `read f` consumes backslashes in filenames -- a file created with `touch 'foo\bar'` would have simply `foobar` assigned to `f`, whereas a filename *ending* in a backslash would cause the unrelated file on the next line to be appended to its name and read as a single line. – Charles Duffy Mar 08 '17 at 13:35
  • (3) - Line-based iteration doesn't work for filenames with literal newlines in their names. `touch $'foo\nbar'` is a perfectly legal command; to correctly be able to iterate over the file it creates, you'd want to NUL-delimit your stream. – Charles Duffy Mar 08 '17 at 13:36
  • 1
    @serv-inc - The answer, given by Charles-Duffy, is already in the [link that moi gave](http://stackoverflow.com/a/21663203/1116842). – Diagon Jan 02 '19 at 22:17
  • 2
    Careful. Any reading of stdin in the loop body will eat some of your input. This can happen if a command wants y/N confirmation, for example. (You may not even notice, if the command that takes a line of your input doesn't complain!) – automorphic Mar 19 '19 at 02:36
  • This doesn't give me the intended solution on Ubuntu 18.04, bash 4.4. I still get errors due to splitting up the paths when I have spaces in filenames. – cslotty Jul 16 '19 at 11:04
  • @CharlesDuffy: Re your comment of 03.08.2017 - For my edification, if one recognizes "major bugs" in an answer of this vintage, why would one NOT make edits to the answer to remove those bugs? –  May 05 '20 at 16:55
  • @Seamus, as a matter of etiquette, it's typical that an answer's author be given the opportunity to perform edits themselves or object to them (explaining why, in their view, the code is better as-is). There's been ample opportunity, so I'd consider this fair game to edit this long after the comment describing the issues was made. Not a strict rule by any means -- one *could* just edit -- but I prefer that people inquire about my choices before applying edits to my answers, so I extend that same courtesy to others. – Charles Duffy May 05 '20 at 19:00
  • @CharlesDuffy: Makes sense - polite, yet persistent. –  May 05 '20 at 23:10
  • @CharlesDuffy (and Seamus) I don't have time to work it. Perfectly fine for the answer to be edited. I can delete it if that helps/saves time? – martin clayton May 06 '20 at 08:55
  • Warning! Some commands, including ffmpeg, will read from stdin, and confuse the "read" command used in the loop. Symptoms are beginnings of the filename being cut off (=> invalid names). You need to disable this behaviour in the command that reads from stdin, two solutions here: https://stackoverflow.com/a/21634699 I would also refer to this more secure answer: https://stackoverflow.com/a/21663203/4566986 – Seneral May 09 '23 at 18:16
156

There are several workable ways to accomplish this.

If you wanted to stick closely to your original version it could be done this way:

getlist() {
        IFS=$'\n'
        for file in $(find . -iname 'foo*') ; do
                printf 'File found: %s\n' "$file"
        done
}

This will still fail if file names have literal newlines in them, but spaces will not break it.

However, messing with IFS isn't necessary. Here's my preferred way to do this:

getlist() {
    while IFS= read -d $'\0' -r file ; do
            printf 'File found: %s\n' "$file"
    done < <(find . -iname 'foo*' -print0)
}

If you find the < <(command) syntax unfamiliar you should read about process substitution. The advantage of this over for file in $(find ...) is that files with spaces, newlines and other characters are correctly handled. This works because find with -print0 will use a null (aka \0) as the terminator for each file name and, unlike newline, null is not a legal character in a file name.

The advantage to this over the nearly-equivalent version

getlist() {
        find . -iname 'foo*' -print0 | while read -d $'\0' -r file ; do
                printf 'File found: %s\n' "$file"
        done
}

Is that any variable assignment in the body of the while loop is preserved. That is, if you pipe to while as above then the body of the while is in a subshell which may not be what you want.

The advantage of the process substitution version over find ... -print0 | xargs -0 is minimal: The xargs version is fine if all you need is to print a line or perform a single operation on the file, but if you need to perform multiple steps the loop version is easier.

EDIT: Here's a nice test script so you can get an idea of the difference between different attempts at solving this problem

#!/usr/bin/env bash

dir=/tmp/getlist.test/
mkdir -p "$dir"
cd "$dir"

touch       'file not starting foo' foo foobar barfoo 'foo with spaces'\
    'foo with'$'\n'newline 'foo with trailing whitespace      '

# while with process substitution, null terminated, empty IFS
getlist0() {
    while IFS= read -d $'\0' -r file ; do
            printf 'File found: '"'%s'"'\n' "$file"
    done < <(find . -iname 'foo*' -print0)
}

# while with process substitution, null terminated, default IFS
getlist1() {
    while read -d $'\0' -r file ; do
            printf 'File found: '"'%s'"'\n' "$file"
    done < <(find . -iname 'foo*' -print0)
}

# pipe to while, newline terminated
getlist2() {
    find . -iname 'foo*' | while read -r file ; do
            printf 'File found: '"'%s'"'\n' "$file"
    done
}

# pipe to while, null terminated
getlist3() {
    find . -iname 'foo*' -print0 | while read -d $'\0' -r file ; do
            printf 'File found: '"'%s'"'\n' "$file"
    done
}

# for loop over subshell results, newline terminated, default IFS
getlist4() {
    for file in "$(find . -iname 'foo*')" ; do
            printf 'File found: '"'%s'"'\n' "$file"
    done
}

# for loop over subshell results, newline terminated, newline IFS
getlist5() {
    IFS=$'\n'
    for file in $(find . -iname 'foo*') ; do
            printf 'File found: '"'%s'"'\n' "$file"
    done
}


# see how they run
for n in {0..5} ; do
    printf '\n\ngetlist%d:\n' $n
    eval getlist$n
done

rm -rf "$dir"
sorpigal
  • 25,504
  • 8
  • 57
  • 75
  • 1
    Accepted your answer: the most complete and interesting -- I didn't knew about `$IFS` and the `< <(cmd)` syntax. Still one thing remains obscure to me, why the `$` in `$'\0'`? Thanks a lot. – gregseth Aug 12 '11 at 12:05
  • @gregseth: This is bash syntax for a literal escape character. For example, if you say CTRL+V and then hit TAB you insert a literal tab. This won't look right when copied and pasted elsewhere, however, but the syntax `$'\t'` will be evaluated as a tab and works the same way. It's just a convenient way to pass certain characters to commands without worrying about the shell mangling them. – sorpigal Aug 12 '11 at 13:05
  • 2
    +1, but you should add ...`while IFS= read`... to handle files that start or end with whitespace. – Gordon Davisson Aug 12 '11 at 14:55
  • @Gordon Davisson: Argh, thanks. It's always something. I have updated my answer to fix that problem. I've also included a script which should help show the difference between different implementations in case anyone is wondering why `IFS=` matters. – sorpigal Aug 12 '11 at 15:17
  • 1
    There is one caveat to the process substitution solution. If you have any prompt inside the loop (or are reading from STDIN in any other way), the input will be filled by the stuff you feed into the loop. (maybe this should be added to the answer?) – andsens Dec 12 '13 at 18:39
  • @Sorpigal This is a very good answer, but you're still missing `IFS=`, in a few places. – Reinstate Monica Please Nov 08 '14 at 19:25
  • @andsens You can just read/write using a different file descriptor. – Reinstate Monica Please Nov 08 '14 at 19:28
  • @BrowSlow Of course! I didn't think of that, thanks. Some googling suggests that mkfifo would be the best tool to create a new FD in this scenario, correct? – andsens Nov 09 '14 at 21:00
  • @BroSlow: There are a variety of holes in this example. I have a more complete one somewhere, to which I will add a link when I have a chance to post it. – sorpigal Nov 14 '14 at 15:11
  • The `< <(cmd)` syntax is what I was looking for. However, script run with `#!/bin/sh` shebang fails. This means it is not portable - not a big issue, but it explains why it's not widely known/used so far. – uvsmtid Nov 17 '15 at 09:01
  • 2
    @uvsmtid: This question was tagged `bash` so I felt safe using bash-specific features. Process substitution is not portable to other shells (sh itself is not likely to ever receive such a significant update). – sorpigal Nov 28 '15 at 13:48
  • 2
    Combining `IFS=$'\n'` with `for` prevents the line-internal word-splitting, but still makes the resulting lines subject to globbing, so this approach isn't fully robust (unless you also turn off globbing first). While `read -d $'\0'` works, it is slightly misleading in that it suggests that you can use `$'\0'` to create NULs - you can't: a `\0` in an [ANSI C-quoted string](http://www.gnu.org/software/bash/manual/bash.html#ANSI_002dC-Quoting) effectively _terminates_ the string, so that `-d $'\0'` is effectively the same as `-d ''`. – mklement0 Apr 02 '16 at 17:36
31

There is also a very simple solution: rely on bash globbing

$ mkdir test
$ cd test
$ touch "stupid file1"
$ touch "stupid file2"
$ touch "stupid   file 3"
$ ls
stupid   file 3  stupid file1     stupid file2
$ for file in *; do echo "file: '${file}'"; done
file: 'stupid   file 3'
file: 'stupid file1'
file: 'stupid file2'

Note that I am not sure this behavior is the default one but I don't see any special setting in my shopt so I would go and say that it should be "safe" (tested on osx and ubuntu).

marchelbling
  • 1,909
  • 15
  • 23
13
find . -iname "foo*" -print0 | xargs -L1 -0 echo "File found:"
Karoly Horvath
  • 94,607
  • 11
  • 117
  • 176
12
find . -name "fo*" -print0 | xargs -0 ls -l

See man xargs.

slhck
  • 36,575
  • 28
  • 148
  • 201
Torp
  • 7,924
  • 1
  • 20
  • 18
6

Since you aren't doing any other type of filtering with find, you can use the following as of bash 4.0:

shopt -s globstar
getlist() {
    for f in **/foo*
    do
        echo "File found: $f"
        # do something useful
    done
}

The **/ will match zero or more directories, so the full pattern will match foo* in the current directory or any subdirectories.

chepner
  • 497,756
  • 71
  • 530
  • 681
4

I really like for loops and array iteration, so I figure I will add this answer to the mix...

I also liked marchelbling's stupid file example. :)

$ mkdir test
$ cd test
$ touch "stupid file1"
$ touch "stupid file2"
$ touch "stupid   file 3"

Inside the test directory:

readarray -t arr <<< "`ls -A1`"

This adds each file listing line into a bash array named arr with any trailing newline removed.

Let's say we want to give these files better names...

for i in ${!arr[@]}
do 
    newname=`echo "${arr[$i]}" | sed 's/stupid/smarter/; s/  */_/g'`; 
    mv "${arr[$i]}" "$newname"
done

${!arr[@]} expands to 0 1 2 so "${arr[$i]}" is the ith element of the array. The quotes around the variables are important to preserve the spaces.

The result is three renamed files:

$ ls -1
smarter_file1
smarter_file2
smarter_file_3
terafl0ps
  • 684
  • 5
  • 8
4

find has an -exec argument that loops over the find results and executes an arbitrary command. For example:

find . -iname "foo*" -exec echo "File found: {}" \;

Here {} represents the found files, and wrapping it in "" allows for the resultant shell command to deal with spaces in the file name.

In many cases you can replace that last \; (which starts a new command) with a \+, which will put multiple files in the one command (not necessarily all of them at once though, see man find for more details).

naught101
  • 18,687
  • 19
  • 90
  • 138
1

I recently had to deal with a similar case, and I built a FILES array to iterate over the filenames:

eval FILES=($(find . -iname "foo*" -printf '"%p" '))

The idea here is to surround each filename with double quotes, separate them with spaces and use the result to initialize the FILES array. The use of eval is necessary to evaluate the double quotes in the find output correctly for the array initialization.

To iterate over the files, just do:

for f in "${FILES[@]}"; do
    # Do something with $f
done
lemraus
  • 11
  • 1
0

In some cases, here if you just need to copy or move a list of files, you could pipe that list to awk as well.
Important the \"" "\" around the field $0 (in short your files, one line-list = one file).

find . -iname "foo*" | awk '{print "mv \""$0"\" ./MyDir2" | "sh" }'
Steve
  • 355
  • 3
  • 8
0

Ok - my first post on Stack Overflow!

Though my problems with this have always been in csh not bash the solution I present will, I'm sure, work in both. The issue is with the shell's interpretation of the "ls" returns. We can remove "ls" from the problem by simply using the shell expansion of the * wildcard - but this gives a "no match" error if there are no files in the current (or specified folder) - to get around this we simply extend the expansion to include dot-files thus: * .* - this will always yield results since the files . and .. will always be present. So in csh we can use this construct ...

foreach file (* .*)
   echo $file
end

if you want to filter out the standard dot-files then that is easy enough ...

foreach file (* .*)
   if ("$file" == .) continue
   if ("file" == ..) continue
   echo $file
end

The code in the first post on this thread would be written thus:-

getlist() {
  for f in $(* .*)
  do
    echo "File found: $f"
    # do something useful
  done
}

Hope this helps!

0

Another solution for job...

Goal was :

  • select/filter filenames recursively in directories
  • handle each names (whatever space in path...)
#!/bin/bash  -e
## @Trick in order handle File with space in their path...
OLD_IFS=${IFS}
IFS=$'\n'
files=($(find ${INPUT_DIR} -type f -name "*.md"))
for filename in ${files[*]}
do
      # do your stuff
      #  ....
done
IFS=${OLD_IFS}


Sorin
  • 5,201
  • 2
  • 18
  • 45
  • Thx for constructive remark, but : 1- this is an actual problem, 2- shell could have evolved in the time ... as everybody i assume; 3- None answer above could satisfy a DIRECT resolution of the pb without changing the problem or disserting :-) – Vince B Jun 24 '19 at 06:49