1

To give context, I'm trying to create a simple version of bash, and for that I need to mimic the way bash parses content with multiple sets of single and double quotes. I can't figure out the overall procedure by which bash handles quotes inside quotes. I noticed some repeated patterns but still don't have the full picture.

For instance this example:

$ "'"ls"'"

evaluates to:

$ 'ls'

or even this uglier example:

$ "'"'"""'"ls"'"""'"'"

evaluates to:

$ '"""ls"""'

I noticed the following patterns arise:

  • if count of wrapping quotes are even it evaluates to what's inside the inverse quotes exclusively
  • if count of wrapping quotes are odd it evaluates to what's inside the inverse quotes inclusively.

For example even wrapping quotes:

$ ""'ls'""

evaluates to what's inside the inverse quotes (single quotes) without the single quotes themselves, evaluation:

$ ls

Or for odd count of wrapper quotes:

$ '''"ls"'''

it evaluates to content of double quotes inclusively:

$ "ls" : command not found.

Still I don't get the full picture of how this parsing pattern for more complex quotes inside quotes is done.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
interesting
  • 149
  • 1
  • 10
  • 2
    It's not really "quotes inside quotes". The string `""'ls'""` does not contain any embedded quotes. Instead it is the concatenation of 3 strings. The empty string (denoted by `""`) followed by `ls` (given as `'ls'`) and the empty string again. – William Pursell May 13 '22 at 17:25
  • 2
    Are you aware that Bash mostly follows [POSIX sh](https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html), a well documented standard? – that other guy May 13 '22 at 17:25
  • 1
    As that other guy implied: Trying to build "a simple version of bash" is a bad place to start in the first place: bash has extra complications added on top of a simpler (but not simple) standard. Start from that standard; if you need anything it doesn't cover, build from there _after_ you have the basics working correctly. – Charles Duffy May 13 '22 at 17:30
  • 1
    BTW, it wasn't relevant to your question here, but bash tracks which quoting type each character had, because downstream processing can be different depending on how each individual character was quoted. You don't want to build a parser based only on this question's answer and discover later that you weren't tracking how each character was parsed (and that you need that information to correctly implement some behaviors)! – Charles Duffy May 13 '22 at 17:33
  • 1
    @CharlesDuffy How does the type of quoting affect downstream processing? My mental model has always been that bash merely tracks *if* a character was quoted. You're saying that single quotes vs. double quotes vs. backslashes can have different observable behavior? – John Kugelman May 13 '22 at 17:38
  • @JohnKugelman, I actually do think you're right there; the cases I had in mind are all "is it quoted at all?" ones, but I was being broad to protect from potential error (at least if we don't consider parameter expansion &c "downstream" of quote analysis) – Charles Duffy May 13 '22 at 17:40

1 Answers1

4

Quotes are processed sequentially, looking for matching closing quotes. Everything between the starting and ending quote becomes a single string. Nested quotes have no special meaning.

"'"ls"'"

When it processes the first ", it scans looking for the next " that ends the string, which contains '.

Then it scans the fixed string ls.

When it processes the " after ls, it scans looking for the next ", resulting in another string '.

These are all concatenated, resulting in 'ls'.

"'"'"""'"ls"'"""'"'"

"'" is the string '.

'"""' is the string """

"ls" is the string ls

'"""' is the string """

"'" is the string '.

Concatenating them all together produces '"""ls"""'

""'ls'""

"" is an empty string. 'ls' is the string ls. "" is another empty string. Concatenating them together produces ls.

'''"ls"'''

'' is an empty string. '"ls"' is the string "ls" (containing literal double quotes). '' is an empty string. Concatenating them produces "ls". Since there's no command with that name (including the literal double quotes), you get an error.

There are differences between single and double quotes, but they don't affect any of the examples you posted. See Difference between single and double quotes in Bash

Barmar
  • 741,623
  • 53
  • 500
  • 612