BASH: Unavoidable wordsplitting in subcommand expansion?

Question

So I'm writing a BASH shell script to perform some CLI testing for a Node project I'm working on (I didn't tag Node in this question because really this solely pertains to BASH); my the CLI testing looks like this:

test_command=$'node source/main.js --input-regex-string \'pcre/(simple)? regex/replace/vim\' -o';
echo $test_command;
$test_command 1>temp_stdout.txt 2>temp_stderr.txt;
test_code=$?;
echo "test_code $test_code"
test_stdout=`cat temp_stdout.txt`;
test_stderr=`cat temp_stderr.txt`;

As you can see, I'm using the C-style quotes $'...', as described here, which should make it so that $test_command expands literally to node source/main.js --input-regex-string 'pcre/(simple)? regex/replace/vim' -o which is what the echo on line 2 shows, however when I attempt to run the command on line 3, I'll get an error saying that regex/replace/vim' isn't a recognised command-line parametre in my script. Obviously, what's happening here is despite me seemingly quoting and escaping everything correctly, BASH is still splitting the regex/replace/vim' part into its own word. Based on everything I've read on the topic of BASH's quoting and word splitting rules, this shouldn't be happening but yet it is. I've tried changing the quoting on the first line to use strong/literal ' quotes ('node source/main.js --input-regex-string "pcre/(simple)? regex/replace/vim" -o' which just causes line 3 to treat the entire thing as one word and thus not work) and the weak/dynamic " quotes ("node source/main.js --input-regex-string 'pcre/(simple)? regex/replace/vim' -o" exact same as strong-quote example, not to mention that since the quoted string in this case is a regular expression literal, it's not a good fit for the magic expansion behaviour of " anyway) in place of the C-style quotes, changing the escaping of the command string itself to fit with whichever quote style is being used; I've tried adding additionally escaping to the string such as test_command=$'node source/main.js --input-regex-string \\\'pcre/(simple)?\ regex/replace/vim\\\' -o only to witness the exact same behaviour; and I've tried changing the way I invoke the command on line 3: quoting the expansion, encasing it in { ... } or ${ ... } with combinations of the previously mentioned variations, all of which still resulted in either original word-splitting problem or me just being given a generic "bad substitution" syntax error.

So, in short, my question is what is the correct way to invoke/format a command, stored as a string in a BASH variable, containing a quoted literal string, that BASH won't inexplicably word split the contained quoted string and break the whole command?

@oguzismail: It's not a bad site, although it could use some maintenance. For example, it explains this particular problem [here](https://wiki.bash-hackers.org/syntax/quoting) (which you can get to by starting at [Beginners Mistakes](https://wiki.bash-hackers.org/syntax/newbie_traps)). See [The Hacker's Dictionary](http://hackersdictionary.com/html/index.html) for a historical view of the use of the term *hacker*. — rici, Jun 17 '20 at 06:01
The shell parses quotes before variables are expanded; therefore putting quotes (or escapes) in a variable doesn't do anything useful. See [BashFAQ #50: I'm trying to put a command in a variable, but the complex cases always fail!](http://mywiki.wooledge.org/BashFAQ/050) (and many previous questions along the same lines). The solution is: don't put commands in variables; they're for data, not for executable code. — Gordon Davisson, Jun 17 '20 at 06:04

KamilCuk · Answer 1 · 2020-06-17T07:33:29.207

what is the correct way to invoke/format a command, stored as a string in a BASH variable, containing a quoted literal string, that BASH won't inexplicably word split the contained quoted string and break the whole command?

The "correct" way (for me) is not to store the command as a string in a variable. The correct way would be to use a function, that also allows to add any logic inside:

test_command() {
    node source/main.js --input-regex-string 'pcre/(simple)? regex/replace/vim' -o "$@"
}
test_command

The correct way would be to store it as an array:

test_command=(node source/main.js --input-regex-string 'pcre/(simple)? regex/replace/vim' -o)
"${test_command[@]}"

An existing way to run stored command as a string in a variable is to use eval which is evil. You can correctly escape the arguments and concatenate them to a string and then execute it with eval:

test_command=$(printf "%q " node source/main.js --input-regex-string 'pcre/(simple)? regex/replace/vim' -o)
eval "$test_command"

this shouldn't be happening but yet it is.

The word splitting is performed on:

The shell scans the results of parameter expansion, command substitution, and arithmetic expansion that did not occur within double quotes for word splitting.

The double or single quotes that resulted from parameter expansions are not special, they are taken literally. It is only important if the parameter expansions itself is within double quotes. Because in your code snippet $test_command is not within double quotes, the result is word spitted, which does:

The shell treats each character of $IFS as a delimiter, and splits the results of the other expansions into words using these characters as field terminators.

And it doesn't care about quotes. It cares about them when determining which argument undergo word splitting - those that are not within double quotes. If an argument undergoes word splitting, the result is just crudely split on whitespaces, quotes are not special there.

score 1 · Accepted Answer · edited Mar 22 '23 at 08:23

what is the correct way to invoke/format a command, stored as a string in a BASH variable, containing a quoted literal string

You assume that the there is no difference between

typing a command directly into the terminal/script
storing the exact same command string into a variable and then executing $variable.

But there are many differences! Commands typed directly into bash undergo more processing steps than anything else. These steps are documented in bash's manual:

Tokenization
Quotes are interpreted. Operators are identified. The command is split into words at whitespace between unquoted parts. IFS is not used here.
Several expansions in a left-to-right fashion. That is, after one of these transformations were applied to a token, bash would continue to process its result with 3. For example, you could safely use a home directory with a literal $ in its pathname as the result of expanding ~ does not undergo variable expansion, thus the $ remains uninterpreted.

brace expansion {1..9}
tilde expansion ~
parameter and variable expansion $var
arithmetic expansion $((...))
command substitution $(...), `...`
process substitution <()

Word splitting
Split the result of unquoted expansions using IFS.
Filename expansion
Also known as globbing: *, ?, [...] and more with shopt -s extglob.

_{Admittedly, this confuses most bash beginners. To me it seems, most of Stackoverflow's bash questions are about things related to these processing steps. Some classical examples are [`for i in {1..$n}` does not work][2] and [`echo $var` does not print what I assigned to `var`][3].}

Strings from unquoted variables only undergo some of the processing steps listed above. As described, these steps are "3. word splitting" and "4. filename expansion".

If you want to apply all processing steps to a string, you can use the eval command. However, this is very frowned upon as there are either better alternatives (if you define the command yourself) or huge security implications (if an outsider defines the command).

In your example, I don't see a reason to store the command at all. But if you really want to access it as a string somewhere else, then use an array:

command=(node source/main.js --input-regex-string 'pcre/(simple)? regex/replace/vim' -o)
echo "${command[*]}" # print
"${command[@]}"      # execute

BASH: Unavoidable wordsplitting in subcommand expansion?

2 Answers2