What idiom should one use in Bash scripts (no Perl, Python, and such please) to build up a command line for another program out of the script's arguments while handling filenames correctly?
By correctly, I mean handling filenames with spaces or odd characters without inadvertently causing the other program to handle them as separate arguments (or, in the case of <
or >
— which are, after all, valid if unfortunate filename characters if properly escaped — doing something even worse).
Here's a made-up example of what I mean, in a form that doesn't handle filenames correctly: Let's assume this script (foo
) builds up a command line for a command (bar
, assumed to be in the path) by taking all of foo
's input arguments and moving anything that looks like a flag to the front, and then invoking bar
:
#!/bin/bash
# This is clearly wrong
FILES=
FLAGS=
for ARG in "$@"; do
echo "foo: Handling $ARG"
if [ x${ARG:0:1} = "x-" ]; then
# Looks like a flag, add it to the flags string
FLAGS="$FLAGS $ARG"
else
# Looks like a file, add it to the files string
FILES="$FILES $ARG"
fi
done
# Call bar with the flags and files (we don't care that they'll
# have an extra space or two)
CMD="bar $FLAGS $FILES"
echo "Issuing: $CMD"
$CMD
(Note that this just an example; there are lots of other times one needs to do this and that to a bunch of args and then pass them onto other programs.)
In a naive scenario with simple filenames, that works great. But if we assume a directory containing the files
one two three and a half four < five
then of course the command foo *
fails miserably in its task:
foo: Handling four < five foo: Handling one foo: Handling three and a half foo: Handling two Issuing: bar four < five one three and a half two
If we actually allow foo
to issue that command, well, the results won't be what we're expecting.
Previously I've tried to handle this through the simple expedient of ensuring that there are quotes around each filename, but I've (very) quickly learned that that is not the correct approach. :-)
So what is? Constraints:
- I want to keep the idiom as simple as possible (not least so I can remember it).
- I'm looking for a general-purpose idiom, hence my making up the
bar
program and the contrived example above instead of using a real scenario where people might easily (and reasonably) go down the route of trying to use features in the target program. - I want to stick to Bash script, I don't want to call out to Perl, Python, etc.
- I'm fine with relying on (other) standard *nix utilities, like
xargs
,sed
, ortr
provided we don't get too obtuse (see #1 above). (Apologies to Perl, Python, etc. programmers who think #3 and #4 combine to draw an arbitrary distinction.) - If it matters, the target program might also be a Bash script, or might not. I wouldn't expect it to matter...
- I don't just want to handle spaces, I want to handle weird characters correctly as well.
- I'm not bothered if it doesn't handle filenames with embedded nul characters (literally character code 0). If someone's managed to create one in their filesystem, I'm not worried about handling it, they've tried really hard to mess things up.
Thanks in advance, folks.
Edit: Ignacio Vazquez-Abrams pointed me to Bash FAQ entry #50, which after some reading and experimentation seems to indicate that one way is to use Bash arrays:
#!/bin/bash
# This appears to work, using Bash arrays
# Start with blank arrays
FILES=()
FLAGS=()
for ARG in "$@"; do
echo "foo: Handling $ARG"
if [ x${ARG:0:1} = "x-" ]; then
# Looks like a flag, add it to the flags array
FLAGS+=("$ARG")
else
# Looks like a file, add it to the files array
FILES+=("$ARG")
fi
done
# Call bar with the flags and files
echo "Issuing (but properly delimited, not exactly as this appears): bar ${FLAGS[@]} ${FILES[@]}"
bar "${FLAGS[@]}" "${FILES[@]}"
Is that correct and reasonable? Or am I relying on something environmental above that will bite me later. It seems to work and it ticks all the other boxes for me (simple, easy to remember, etc.). It does appear to rely on a relatively recent Bash feature (FAQ entry #50 mentions v3.1, but I wasn't sure whether that was arrays in general of some of the syntax they were using with it), but I think it's likely I'll only be dealing with versions that have it.
(If the above is correct and you want to un-delete your answer, Ignacio, I'll accept it provided I haven't accepted any others yet, although I stand by my statement about link-only answers.)