2

I have a basic understanding of how bash splits a line into arguments for a program, and enough to avoid problems with arguments containing spaces, but I would like to go that extra step and understand what is happening and why. Most guides tell you what do to, but not why it works. Some examples might help to explain...

I'll be using this short Python script to dump argument lists:

#!/usr/bin/env python
import sys
print sys.argv[1:]

Let's call it "dumpargs". (You could write it in C or even bash, but Python is concise enough and I don't want to confuse matters by contending with an extra layer of bash interpreting and expanding strings.)

First of all, some easy examples:

$ dumpargs foo bar baz
['foo', 'bar', 'baz']
$ dumpargs "foo bar" baz
['foo bar', 'baz']

Okay, great. We can use quotes to pass an argument that contains spaces by wrapping the quotes around it. But we're not restricted to putting the quotes on the outside of the argument. What if we put them in the middle?

$ dumpargs foo" "bar
['foo bar']
$ dumpargs foo" "bar" "baz xyzzy
['foo bar baz', 'xyzzy']

Okay, cool. I think this demonstrates that the quotes just modify how the spaces are interpreted. Spaces occurring between double quotes aren't argument separators. The unquoted spaces become separators, the quoted spaces become genuine spaces and the quotes evaporate.

What about arrays?

$ xs=(one two "buckle my shoe")

$ dumpargs ${xs[*]}
['one', 'two', 'buckle', 'my', 'shoe']
$ dumpargs ${xs[@]}
['one', 'two', 'buckle', 'my', 'shoe']
$ dumpargs "${xs[*]}"
['one two buckle my shoe']
$ dumpargs "${xs[@]}"
['one', 'two', 'buckle my shoe']

Clearly the last of the four is most generally useful, and most likely what we want to use in places where our array represents, say, a list of filenames. The others all confuse the spaces in "buckle my shoe" with the separators between array elements. But what is it actually doing? It looks like it's composed of a variable expansion and a quoting operation. Is it? Or does bash just use special treatment for the case when it sees double quotes immediately surrounding an array expansion?

Here are some more examples to try to test what's going on:

$ xs=(one two "buckle my shoe")

$ dumpargs "${xs[@]} stop"
['one', 'two', 'buckle my shoe stop']
$ dumpargs "${xs[@]} and ${xs[@]}"
['one', 'two', 'buckle my shoe and one', 'two', 'buckle my shoe']

I think this shows at least that it isn't simply special-cased for a pair of quotes directly around an array expansion. The array expansion produces some kind of string-like output, and the quotes affect how that string-like thing is converted into a sequence of arguments. But it's not just a plain string, because it has two distinct kinds of space-like thing in it. It has some sort of "argument separator characters" than will go on to become argument separators regardless of quotes, but it also has "honest to goodness spaces" that won't become argument separators if they are surrounded by quotes. In contrast, ${xs[*]} outputs a regular string with only "honest to goodness spaces" and no special "argument separator characters".

Is that a good way to understand it? Is there a better way to understand how and when bash renders an array into a sequence of characters and how and when it splits apart arguments?

Weeble
  • 17,058
  • 3
  • 60
  • 75
  • @devnull: Care to explain your comment? – Aaron Digulla Feb 25 '14 at 11:23
  • 1
    @AaronDigulla If the OP _actually did so much of research_, saying `set -x` and `echo` instead of `dumpargs` would have explained much of it. That said I see 2 questions here: (1) difference between `@` and `*`, (2) effect of quoting variables; both of which have asked and answered numerous times. – devnull Feb 25 '14 at 11:26
  • 1
    @devnull No. It's true that I don't *need* to know this to solve an immediate problem, but I consider it an instance of "give someone a fish and they'll eat for a day, teach them to fish and they'll eat for a lifetime". I want a deeper understanding, but I'm finding it hard to find guides that do anything more than "give me a fish". – Weeble Feb 25 '14 at 11:26
  • @Weeble As mentioned in my previous comment, you seem to have two questions. Moreover, doing `set -x` before executing your commands would pretty much tell it all. – devnull Feb 25 '14 at 11:32
  • @devnull Thanks, I didn't know about `set -x`. But I can rewrite the question using `set -x` instead of dumpargs and it remains pretty much unchanged. As far as I can tell, `set -x` shows the end result of all the variable expansions and argument splitting, but doesn't give any insight into what happens in the middle. – Weeble Feb 25 '14 at 11:42
  • 2
    It's documented behaviour: http://www.gnu.org/software/bash/manual/bashref.html#Arrays – glenn jackman Feb 25 '14 at 11:58
  • 1
    @glennjackman Thanks! The paragraph beginning "Any element of an array..." does explain it precisely. I just wasn't looking in the right places. I don't feel as much wiser as I hoped, but that exactly answers the question I asked, and does so from the most authoritative source. I'll accept that if posted as an answer. – Weeble Feb 25 '14 at 12:09

1 Answers1

4

The origin of this behavior is probably the old "pass arguments to subshell" problem. In the beginning, we had $* which worked until you started to use spaces in arguments.

 Input         Subshell sees
 a b           "a" "b"
 "a b"         "a" "b"
 a b\ c        "a" "b" "c" 
 a b\\\ c      "a" "b c" 

We could quote $* but that would merge all arguments into a single string parameter (i.e. the subshell would always see "a b" or "a b c"). Clearly, that's no good.

So the @ form was introduced. Without quotes, $* and $@ behave similar. With quotes - "$@" - expands to a list of properly quoted arguments.

When KSH / BASH introduced arrays, they keep the symmetry (without $*, you couldn't turn the array into a single string).

Related:

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
  • I like this because it gives useful history and context, but I think this still isn't enough to really understand what's going on when anything else appears inside quotes along with the $@. I understand what `"$@"` means as a composed unit. I still have a poor understanding of what `$@` means alone or in a larger quoted construction. I have a hypothesis, described in the question, and it's consistent with what I observe, but is it true? – Weeble Feb 25 '14 at 11:52
  • 1
    `$@` checks if it is quoted and if so, it expands in a properly quoted sequence of the elements of the array. If you look at the source code of the parser, you'll see it actually looks for `"$@"` as a token in the input (search for `select_command:` in http://git.savannah.gnu.org/cgit/bash.git/tree/parse.y) – Aaron Digulla Feb 25 '14 at 12:46
  • The parsing of each line is also in that file, in `read_token_word()` is the handling of quotes and I think hidden in there somewhere is the handling of `"$@"` as well. – Aaron Digulla Feb 25 '14 at 12:51
  • *grumble* re: linking the ABS (which is to bash much as W3Schools is to Javascript -- an outdated resource full of bad-practice examples but with a whole lot of Google juice). As alternatives, consider [BashFAQ #5](http://mywiki.wooledge.org/BashFAQ/005), the [BashGuide on Arrays](http://mywiki.wooledge.org/BashGuide/Arrays), or the [Bash-Hackers' Wiki on Arrays](http://wiki.bash-hackers.org/syntax/arrays). Or of course the [manual](https://www.gnu.org/software/bash/manual/bashref.html#Arrays). – Charles Duffy Apr 24 '17 at 17:38
  • @CharlesDuffy Thanks, fixed. – Aaron Digulla May 30 '17 at 12:51