Read filenames with embedded whitespace into an array in a shell script

Question

Basically I'm searching for a multi-word file which is present in many directories using find command and the output is stored on to a variable vari

    vari = `find -name "multi word file.xml"

When I try to delete the file using a for loop to iterate through.,

    for file in ${vari[@]}

the execution fails saying.,

    rm: cannot remove `/abc/xyz/multi':: No such file or directory

Could you guys please help me with this scenario??

Possible duplicate of [Bash : iterate over list of files with spaces](http://stackoverflow.com/questions/7039130/bash-iterate-over-list-of-files-with-spaces) — Biffen, Apr 02 '16 at 12:49
It is not a dupliocate, the other question deals with lists and word splitting, this question deals with word splitting for arrays. — eckes, Apr 02 '16 at 16:17

score 6 · Accepted Answer · edited May 23 '17 at 12:07

If you really need to capture all file paths in an array up front (assumes bash, primarily due to use of arrays and process substitution (<(...))^[1]; a POSIX-compliant solution would be more cumbersome^[2]; also note that this is a line-based solution, so it won't handle filenames with embedded newlines correctly, but that's very rare in practice):

# Read matches into array `vari` - safely: no word splitting, no
# globbing. The only caveat is that filenames with *embedded* newlines
# won't be handled correctly, but that's rarely a concern.
# bash 4+:
readarray -t vari < <(find . -name "multi word file.xml")
# bash 3:
IFS=$'\n' read -r -d '' -a vari < <(find . -name "multi word file.xml")

# Invoke `rm` with all array elements:
rm "${vari[@]}"  # !! The double quotes are crucial.

Otherwise, let find perform the deletion directly (these solutions also handle filenames with embedded newlines correctly):

find . -name "multi word file.xml" -delete

# If your `find` implementation doesn't support `-delete`:
find . -name "multi word file.xml" -exec rm {} +

As for what you tried:

vari=`find -name "multi word file.xml"` (I've removed the spaces around =, which would result in a syntax error) does not create an array; such a command substitution returns the stdout output from the enclosed command as a single string (with trailing newlines stripped).
- By enclosing the command substitution in ( ... ), you could create an array:
  vari=( `find -name "multi word file.xml"` ),
  but that would perform word splitting on the find's output and not properly preserve filenames with spaces.
- While this could be addressed with IFS=$'\n' so as to only split at line boundaries, the resulting tokens are still subject to pathname expansion (globbing), which can inadvertently alter the file paths.
- While this could also be addressed with a shell option, you now have 2 settings you need to perform ahead of time and restore to their original value; thus, using readarray or read as demonstrated above is the simpler choice.
Even if you did manage to collect the file paths correctly in $vari as an array, referencing that array as ${vari[@]} - without double quotes - would break, because the resulting strings are again subject to word splitting, and also pathname expansion (globbing).
- To safely expand an array to its elements without any interpretation of its elements, double-quote it: "${vari[@]}"

^[1]

Process substitution rather than a pipeline is used so as to ensure that readarray / read is executed in the current shell rather than in a subshell.

As eckes points out in a comment, if you were to try find ... | IFS=$'\n' read ... instead, read would run in a subshell, which means that the variables it creates will disappear (go out of scope) when the command returns and cannot be used later.

^[2]

The POSIX shell spec. supports neither arrays nor process substitution (nor readarray, nor any read options other than -r); you'd have to implement line-by-line processing as follows:

while IFS='
' read -r vari; do
  pv vari
done <<EOF
$(find . -name "multi word file.xml")
EOF

Note the require actual newline between IFS=' and ' in order to assign a newline, given that the $'\n' syntax is not available.

Good answer, Since it explains everything in detail I might want to add that `find | IFS=$'\n' read ...` would be the better syntax but this way read cannot create the array in the main shell (but in a subshell of he pipe - and therefore is not set afterwards). — eckes, Apr 02 '16 at 16:12
Let me add to the answer that when you want more complicated processing of the results you can also use the safe `find -print0 | xargs -0` extension. In that case you need a shell script to process the positional arguments. It is not only safe against all possible file names, it also allows working with very large directories in multiple processes (and even parallel). — eckes, Apr 02 '16 at 16:14
@eckes: Thanks; I've added the subshell caveat to the answer; re `-print0` + `xargs -0`: good tip. — mklement0, Apr 02 '16 at 16:34
@ShreyasAthreya: Glad to hear it; I suggest you use tag `bash` explicitly in the future rather than the generic tag `shell` - the latter by itself suggests that you're looking for POSIX-compliant solution. — mklement0, Apr 02 '16 at 17:13
@mklement0 I can say I learnt something today thank you, particularly the POSIX considerations are very interesting. I tested these above 3 solutions which all failed see my answer without solution but with the tests. — Jay jargot, Apr 02 '16 at 20:06
@Jayjargot: Yes, the `readarray` / `read` solutions are _line_-based, and therefore do not handle filenames with _embedded_ newlines correctly; by contrast, the `find` solutions with `-delete` and `-rm {} +` do. Given how rare filenames with embedded newlines are, I didn't mention this initially, but I've added a note. — mklement0, Apr 02 '16 at 22:58

Cole Tierney · Answer 2 · 2016-04-03T00:18:15.797

2

Here are a few approaches:

# change the input field separator to a newline to ignore spaces
IFS=$'\n'
for file in $(find . -name '* *.xml'); do
    ls "$file"
done

# pipe find result lines to a while loop
IFS=
find . -name '* *.xml' | while read -r file; do
    ls "$file"
done

# feed the while loop with process substitution
IFS=
while read -r file; do
    ls "$file"
done < <(find . -name '* *.xml')

When you're satisfied with the results, replace ls with rm.

edited Apr 03 '16 at 00:18

answered Apr 02 '16 at 14:53

Cole Tierney

9,571
1
27
35

It's worth mentioning that even though the `IFS=$'\n'` prevents line-internal _word splitting_, the result of the command substitution is still subject to _globbing_. Adding `-r` to `read` is always a good idea, but, more importantly, by not prepending `IFS=` you'll end up stripping leading and trailing whitespace from each input line (which would be unusual in filenames, but it's a good practice to encourage if you want to read lines _as-is_). – mklement0 Apr 02 '16 at 15:40
There were two separate tips: `-r` to prevent interpretation of backslashes, and setting `IFS` to the null (empty) string to prevent removal of per-line leading and trailing whitespace; applied to your example: `echo $' a\\bc \n\txyz\t' | while IFS= read -r f; do echo "$f"; done` (note that I've added a backslash after `a` to demonstrate that `-r` works as intended). – mklement0 Apr 02 '16 at 17:19
1

Thanks, I see now. – Cole Tierney Apr 02 '16 at 19:00
@ColeTierney Thank you for these solutions. When tested it failed, see my answer without solution. What do you think? – Jay jargot Apr 02 '16 at 20:22
Jay, I've edited my answer as per @mklement0's suggestions. Does this effect your results? – Cole Tierney Apr 02 '16 at 21:40
@ColeTierney The very first solution is working when there is one file retrieved with the **find** in the weird test environment, but it is failing when there are 2 files. Thank you to have demonstrated, when there is one file that the **rm** command can be used with a variable. – Jay jargot Apr 02 '16 at 23:11
@ColeTierney: Both my `readarray` / `read` solutions and all of yours are invariably _line_-based, so by definition they cannot handle filenames with _embedded_ newlines, which is what made Jay's tests fail. – mklement0 Apr 02 '16 at 23:13
@ColeTierney: unfortunately, the edits you made weren't quite what I had in mind: the `for` solution still needs `IFS=$'\n'` to perform line-by-line processing (but, as stated, the result is still subject to globbing); for the remaining solutions, `while IFS= read -r file` is preferable, because it localizes the `IFS` change. – mklement0 Apr 02 '16 at 23:15
@Jayjargot: Cole's `for` solution presently mistakenly reads the _entire_ `find` output _at once_, due to setting `IFS` to the _empty_ string rather than to `$'\n'` - that's why it _happens to work for a single file only_ (albeit even for one whose path includes newlines). Given this fundamental limitation, it is _not_ a viable solution. – mklement0 Apr 02 '16 at 23:40
1

Thanks @mklement0. I didn't test my for loop example with a null IFS. I've switched it back to \n. – Cole Tierney Apr 03 '16 at 00:20

Jay jargot · Answer 3 · 2016-04-03T07:55:12.663

0

The solutions are all line-based solutions. There is a test environment at bottom for which there is no known solution.

As already written, the file could be removed with this tested command:

$ find . -name "multi word file".xml -exec rm {} +

I did not manage to use rm command with a variable when the path or filename contains \n.

Test environment:

$ mkdir "$(printf "\1\2\3\4\5\6\7\10\11\12\13\14\15\16\17\20\21\22\23\24\25\26\27\30\31\32\33\34\35\36\37\40\41\42\43\44\45\46\47testdir" "")"
$ touch "multi word file".xml
$ mv *xml *testdir/
$ touch "2nd multi word file".xml ; mv *xml *testdir
$ ls -b
\001\002\003\004\005\006\a\b\t\n\v\f\r\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037\ !"#$%&'testdir
$ ls -b *testdir
2nd\ multi\ word\ file.xml  multi\ word\ file.xml

edited Apr 03 '16 at 07:55

answered Apr 02 '16 at 13:51

Jay jargot

2,745
1
11
14

Could you please explain to me what the above statement does? – Shreyas Athreya Apr 02 '16 at 13:54
I mean please elaborate – Shreyas Athreya Apr 02 '16 at 13:54
1

The for loop over arrays has no problems with space in the array members as long as you use the proper `"${arr[@]}"` (quoted) form. The assigned loop variable must be quoted the same way then. – eckes Apr 02 '16 at 16:16
Yes, the `readarray` / `read` solutions are _by design_ _line_-based, and therefore cannot handle filenames with _embedded_ newlines correctly; the same goes for Cole's solutions. Thus, if you remove `\12` - which represents a newline - from your sample directory name, the tests will work. Note that filenames with embedded newlines are very rare in practice, and you're likely to encounter problems in many situations with them; I've since added a note to my answer. – mklement0 Apr 02 '16 at 23:07
1

@mklement0 true, this is rare and I personnaly never seen that, except maybe after a mistake. **\n** in filenames seems indeed to make many commands fail. – Jay jargot Apr 02 '16 at 23:13
While I appreciate that you pointed out the existing answer's limitations with respect to embedded newlines, you can save readers time by mentioning at the top that these answers are _by design line-based_ - in other words: it's a built-in limitation that _does not require tests to demonstrate_. Cole's first example currently works with a single matching file with embedded newlines, due to setting `IFS` to the null string in mistaken response to my hints. However, given that this accidental solution only ever works with _any single_ matching file, it is _not_ worth mentioning in your answer – mklement0 Apr 02 '16 at 23:54
The answer is must more short now. You guys have found solutions to the answer, there is no doubt. My point is that I do not know if this is even possible to find a solution which would work in any cases. – Jay jargot Apr 03 '16 at 07:59

Read filenames with embedded whitespace into an array in a shell script

3 Answers3