I am trying to decidedly understand Bash parser’s order of business.
This wiki page claims the following order:
- Read line.
- Process/remove quotes.
- Split on semicolons.
- Process 'special operators', which according to the article are:
- Command groupings and brace expansions, e.g.
{…}
.- Process substitutions, e.g.
cmd1 <(cmd2)
.- Redirections.
- Pipelines.
- Perform expansions, which are not all listed, but should include:
- Brace expansion, e.g.
{1..3}
. For some reason the article tucks this into previous stage.- Tilde expansion, e.g.
~root
.- Parameter & variable expansion, e.g.
${var##*/}
.- Arithmetic expansion, e.g.
$((1+12))
.- Command substitution, e.g.
$(date)
.- Word splitting, that applies to the results of the expansions; uses
$IFS
.- Pathname expansion, or globbing, e.g.
ls ?d*
.- Word splitting, that applies to the whole line; does not use
$IFS
.- Execution.
This is not a quote, but paraphrased contents of the linked article.
Furthermore there are Bash man pages, and this SO answer claiming to be based on those pages. According to the answer, stages of command parsing are as follows:
- initial word splitting
- brace expansion
- tilde expansion
- parameter, variable and arithmetic expansion
- command substitution
- secondary word splitting
- path expansion (aka globbing)
- quote removal
Emphasis mine.
I am assuming, that by “initial word splitting” the author means splitting of the entire line, and by “secondary word splitting” they mean splitting of the results of the expansions. This would entail that there exist at least two distinct processes of tokenization during command parsing.
Considering the ordering contradictions between two sources, what is the actual order in which the input command line is being de-quoted and split into words/tokens, relative to the other operations being performed?
EDIT NOTE:
To explain part of the answers, earlier version of this question had a sub-question:
Why does
cmd='var=foo';$cmd
producebash: var=foo: command not found
?