I have a basic understanding of how bash splits a line into arguments for a program, and enough to avoid problems with arguments containing spaces, but I would like to go that extra step and understand what is happening and why. Most guides tell you what do to, but not why it works. Some examples might help to explain...
I'll be using this short Python script to dump argument lists:
#!/usr/bin/env python
import sys
print sys.argv[1:]
Let's call it "dumpargs". (You could write it in C or even bash, but Python is concise enough and I don't want to confuse matters by contending with an extra layer of bash interpreting and expanding strings.)
First of all, some easy examples:
$ dumpargs foo bar baz
['foo', 'bar', 'baz']
$ dumpargs "foo bar" baz
['foo bar', 'baz']
Okay, great. We can use quotes to pass an argument that contains spaces by wrapping the quotes around it. But we're not restricted to putting the quotes on the outside of the argument. What if we put them in the middle?
$ dumpargs foo" "bar
['foo bar']
$ dumpargs foo" "bar" "baz xyzzy
['foo bar baz', 'xyzzy']
Okay, cool. I think this demonstrates that the quotes just modify how the spaces are interpreted. Spaces occurring between double quotes aren't argument separators. The unquoted spaces become separators, the quoted spaces become genuine spaces and the quotes evaporate.
What about arrays?
$ xs=(one two "buckle my shoe")
$ dumpargs ${xs[*]}
['one', 'two', 'buckle', 'my', 'shoe']
$ dumpargs ${xs[@]}
['one', 'two', 'buckle', 'my', 'shoe']
$ dumpargs "${xs[*]}"
['one two buckle my shoe']
$ dumpargs "${xs[@]}"
['one', 'two', 'buckle my shoe']
Clearly the last of the four is most generally useful, and most likely what we want to use in places where our array represents, say, a list of filenames. The others all confuse the spaces in "buckle my shoe"
with the separators between array elements. But what is it actually doing? It looks like it's composed of a variable expansion and a quoting operation. Is it? Or does bash just use special treatment for the case when it sees double quotes immediately surrounding an array expansion?
Here are some more examples to try to test what's going on:
$ xs=(one two "buckle my shoe")
$ dumpargs "${xs[@]} stop"
['one', 'two', 'buckle my shoe stop']
$ dumpargs "${xs[@]} and ${xs[@]}"
['one', 'two', 'buckle my shoe and one', 'two', 'buckle my shoe']
I think this shows at least that it isn't simply special-cased for a pair of quotes directly around an array expansion. The array expansion produces some kind of string-like output, and the quotes affect how that string-like thing is converted into a sequence of arguments. But it's not just a plain string, because it has two distinct kinds of space-like thing in it. It has some sort of "argument separator characters" than will go on to become argument separators regardless of quotes, but it also has "honest to goodness spaces" that won't become argument separators if they are surrounded by quotes. In contrast, ${xs[*]}
outputs a regular string with only "honest to goodness spaces" and no special "argument separator characters".
Is that a good way to understand it? Is there a better way to understand how and when bash renders an array into a sequence of characters and how and when it splits apart arguments?