3

What idiom should one use in Bash scripts (no Perl, Python, and such please) to build up a command line for another program out of the script's arguments while handling filenames correctly?

By correctly, I mean handling filenames with spaces or odd characters without inadvertently causing the other program to handle them as separate arguments (or, in the case of < or > — which are, after all, valid if unfortunate filename characters if properly escaped — doing something even worse).

Here's a made-up example of what I mean, in a form that doesn't handle filenames correctly: Let's assume this script (foo) builds up a command line for a command (bar, assumed to be in the path) by taking all of foo's input arguments and moving anything that looks like a flag to the front, and then invoking bar:

#!/bin/bash
# This is clearly wrong

FILES=
FLAGS=
for ARG in "$@"; do
    echo "foo: Handling $ARG"
    if [ x${ARG:0:1} = "x-" ]; then
        # Looks like a flag, add it to the flags string
        FLAGS="$FLAGS $ARG"
    else
        # Looks like a file, add it to the files string
        FILES="$FILES $ARG"
    fi
done

# Call bar with the flags and files (we don't care that they'll
# have an extra space or two)
CMD="bar $FLAGS $FILES"
echo "Issuing: $CMD"
$CMD

(Note that this just an example; there are lots of other times one needs to do this and that to a bunch of args and then pass them onto other programs.)

In a naive scenario with simple filenames, that works great. But if we assume a directory containing the files

one
two
three and a half
four < five

then of course the command foo * fails miserably in its task:

foo: Handling four < five
foo: Handling one
foo: Handling three and a half
foo: Handling two
Issuing: bar   four < five one three and a half two

If we actually allow foo to issue that command, well, the results won't be what we're expecting.

Previously I've tried to handle this through the simple expedient of ensuring that there are quotes around each filename, but I've (very) quickly learned that that is not the correct approach. :-)

So what is? Constraints:

  1. I want to keep the idiom as simple as possible (not least so I can remember it).
  2. I'm looking for a general-purpose idiom, hence my making up the bar program and the contrived example above instead of using a real scenario where people might easily (and reasonably) go down the route of trying to use features in the target program.
  3. I want to stick to Bash script, I don't want to call out to Perl, Python, etc.
  4. I'm fine with relying on (other) standard *nix utilities, like xargs, sed, or tr provided we don't get too obtuse (see #1 above). (Apologies to Perl, Python, etc. programmers who think #3 and #4 combine to draw an arbitrary distinction.)
  5. If it matters, the target program might also be a Bash script, or might not. I wouldn't expect it to matter...
  6. I don't just want to handle spaces, I want to handle weird characters correctly as well.
  7. I'm not bothered if it doesn't handle filenames with embedded nul characters (literally character code 0). If someone's managed to create one in their filesystem, I'm not worried about handling it, they've tried really hard to mess things up.

Thanks in advance, folks.


Edit: Ignacio Vazquez-Abrams pointed me to Bash FAQ entry #50, which after some reading and experimentation seems to indicate that one way is to use Bash arrays:

#!/bin/bash
# This appears to work, using Bash arrays

# Start with blank arrays
FILES=()
FLAGS=()
for ARG in "$@"; do
    echo "foo: Handling $ARG"
    if [ x${ARG:0:1} = "x-" ]; then
        # Looks like a flag, add it to the flags array
        FLAGS+=("$ARG")
    else
        # Looks like a file, add it to the files array
        FILES+=("$ARG")
    fi
done

# Call bar with the flags and files
echo "Issuing (but properly delimited, not exactly as this appears): bar ${FLAGS[@]} ${FILES[@]}"
bar "${FLAGS[@]}" "${FILES[@]}"

Is that correct and reasonable? Or am I relying on something environmental above that will bite me later. It seems to work and it ticks all the other boxes for me (simple, easy to remember, etc.). It does appear to rely on a relatively recent Bash feature (FAQ entry #50 mentions v3.1, but I wasn't sure whether that was arrays in general of some of the syntax they were using with it), but I think it's likely I'll only be dealing with versions that have it.

(If the above is correct and you want to un-delete your answer, Ignacio, I'll accept it provided I haven't accepted any others yet, although I stand by my statement about link-only answers.)

Community
  • 1
  • 1
T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
  • 1
    The reference to version 3.1 is regarding to the `+=` syntax of adding elements to an array. I *believe* that you can do `array=("${array[@]}" '111 222' 333 '444 555')` in earlier versions (it works in current ones). – Dennis Williamson Nov 11 '10 at 16:19
  • @Dennis: Found a reference for that, entry 005 in the same FAQ: http://mywiki.wooledge.org/BashFAQ/005 – T.J. Crowder Nov 11 '10 at 17:30

3 Answers3

5

Why do you want to "build up" a command? Add the files and flags to arrays using proper quoting and issue the command directly using the quoted arrays as arguments.

Selected lines from your script (omitting unchanged ones):

if [[ ${ARG:0:1} == - ]]; then    # using a Bash idiom
FLAGS+=("$ARG")                   # add an element to an array
FILES+=("$ARG")
echo "Issuing: bar \"${FLAGS[@]}\" \"${FILES[@]}\""
bar "${FLAGS[@]}" "${FILES[@]}"

For a quick demo of using arrays in this manner:

$ a=(aaa 'bbb ccc' ddd); for arg in "${a[@]}"; do echo "..${arg}.."; done

Output:

..aaa..
..bbb ccc..
..ddd..

Please see BashFAQ/050 regarding putting commands in variables. The reason that your script doesn't work is because there's no way to quote the arguments within a quoted string. If you were to put quotes there, they would be considered part of the string itself instead of as delimiters. With the arguments left unquoted, word splitting is done and arguments that include spaces are seen as more than one argument. Arguments with "<", ">" or "|" are not a problem in any case since redirection and piping is performed before variable expansion so they are seen as characters in a string.

By putting the arguments (filenames) in an array, spaces, newlines, etc., are preserved. By quoting the array variable when it's passed as an argument, they are preserved on the way to the consuming program.

Some additional notes:

  • Use lowercase (or mixed case) variable names to reduce the chance that they will collide with the shell's builtin variables.
  • If you use single square brackets for conditionals in any modern shell, the archaic "x" idiom is no longer necessary if you quote the variables (see my answer here). However, in Bash, use double brackets. They provide additional features (see my answer here).
  • Use getopts as Let_Me_Be suggested. Your script, though I know it's only an example, will not be able to handle switches that take arguments.
  • This for ARG in "$@" can be shortened to this for ARG (but I prefer the readability of the more explicit version).
Community
  • 1
  • 1
Dennis Williamson
  • 346,391
  • 90
  • 374
  • 439
  • Thanks, that's exactly what I needed. I'd gotten mostly there from Ignacio's deleted answer, but much better to have the full picture as you've provided. Much appreciated. (The options thing really is a red herring, I tried -- unsuccessfully, it would appear -- to pick something innocuous to do to the args before pasing them on, purely for the purposes of an example!) – T.J. Crowder Nov 11 '10 at 16:03
1

See BashFAQ #50 (and also maybe #35 on option parsing). For the scenario you describe, where you're building a command dynamically, the best option is to use arrays rather than simple strings, as they won't lose track of where the word boundaries are. The general rules are: to create an array, instead of VAR="foo bar baz", use VAR=("foo" "bar" "baz"); to use the array, instead of $VAR, use "${VAR[@]}". Here's a working version of your example script using this method:

#!/bin/bash
# This is clearly wrong

FILES=()
FLAGS=()
for ARG in "$@"; do
    echo "foo: Handling $ARG"
    if [ x${ARG:0:1} = "x-" ]; then
        # Looks like a flag, add it to the flags array
        FLAGS=("${FLAGS[@]}" "$ARG") # FLAGS+=("$ARG") would also work in bash 3.1+, as Dennis pointed out
    else
        # Looks like a file, add it to the files string
        FILES=("${FILES[@]}" "$ARG")
    fi
done

# Call bar with the flags and files (we don't care that they'll
# have an extra space or two)
CMD=("bar" "${FLAGS[@]}" "${FILES[@]}")
echo "Issuing: ${CMD[*]}"
"${CMD[@]}"

Note that in the echo command I used "${VAR[*]}" instead of the [@] form because there's no need/point to preserving word breaks here. If you wanted to print/record the command in unambiguous form, this would be a lot messier.

Also, this gives you no way to build up redirections or other special shell options in the built command -- if you add >outfile to the FILES array, it'll be treated as just another command argument, not a shell redirection. If you need to programmatically build these, be prepared for headaches.

Gordon Davisson
  • 118,432
  • 16
  • 123
  • 151
0

getopts should be able to handle spaces in arguments correctly ("file name.txt"). Weird characters should work as well, assuming they are correctly escaped (ls -b).

Šimon Tóth
  • 35,456
  • 20
  • 106
  • 151
  • How would you apply `getopts` to the above? – T.J. Crowder Nov 11 '10 at 15:08
  • Reading it once more, you actually don't need `getopts`. Just use `ls -b` to get the file list. – Šimon Tóth Nov 11 '10 at 15:14
  • @Let_Me_Be: I'm not looking for a file list. I'm looking to properly pre-process arguments, including but not limited to filenames, and handle filenames correctly when passing them on to the next command. – T.J. Crowder Nov 11 '10 at 15:18
  • @TJ Well, once you receive correctly escaped parameters, you shouldn't be able to break them. `four < five` isn't properly escaped. And you shouldn't even be able to pass something like this as an argument in the first place. – Šimon Tóth Nov 11 '10 at 15:22
  • @Let_Me_Be: Sure I can: `echo "Hi there">"four < five"` creates a file called `four < five`. If I then were silly enough to run my `foo` script above, `bar` would get called and try to do the input redirect from the `five` file. (Yes, it would be **really silly** to put those characters in a filename.) – T.J. Crowder Nov 11 '10 at 15:27
  • @TJ Yes, `"four < five"` is properly escaped. So you want to know how to maintain the `"` character? – Šimon Tóth Nov 11 '10 at 15:29
  • @Let_Me_Be: As I said in the question, I want to call `bar` with the arguments properly delimited. I don't think that's a literal matter of maintaining a quote character, but then again, I'm the one asking the question, so... – T.J. Crowder Nov 11 '10 at 15:47