25

Suppose I have a #!/bin/sh script which can take a variety of positional parameters, some of which may include spaces, either/both kinds of quotes, etc. I want to iterate "$@" and for each argument either process it immediately somehow, or save it for later. At the end of the script I want to launch (perhaps exec) another process, passing in some of these parameters with all special characters intact.

If I were doing no processing on the parameters, othercmd "$@" would work fine, but I need to pull out some parameters and process them a bit.

If I could assume Bash, then I could use printf %q to compute quoted versions of args that I could eval later, but this would not work on e.g. Ubuntu's Dash (/bin/sh).

Is there any equivalent to printf %q that can be written in a plain Bourne shell script, using only built-ins and POSIX-defined utilities, say as a function I could copy into a script?

For example, a script trying to ls its arguments in reverse order:

#!/bin/sh
args=
for arg in "$@"
do
    args="'$arg' $args"
done
eval "ls $args"

works for many cases:

$ ./handle goodbye "cruel world"
ls: cannot access cruel world: No such file or directory
ls: cannot access goodbye: No such file or directory

but not when ' is used:

$ ./handle goodbye "cruel'st world"
./handle: 1: eval: Syntax error: Unterminated quoted string

and the following works fine but relies on Bash:

#!/bin/bash
args=
for arg in "$@"
do
    printf -v argq '%q' "$arg"
    args="$argq $args"
done
eval "ls $args"
Jesse Glick
  • 24,539
  • 10
  • 90
  • 112
  • 2
    POSIX sh is not "POSIX Bourne", but "POSIX sh"; it's an early-90s specification far closer to ksh88 than to 70s-era Bourne. – Charles Duffy May 03 '15 at 03:55
  • I've found a portable and bash-specific implementations of a function for this puprose (`func_quote`) being discussed in the libtool project's mailing list: https://lists.gnu.org/archive/html/bug-libtool/2015-10/msg00009.html – imz -- Ivan Zakharyaschev Apr 16 '17 at 10:23

6 Answers6

13

This is absolutely doable.

The answer you see by Jesse Glick is approximately there, but it has a couple of bugs, and I have a few more alternatives for your consideration, since this is a problem I ran into more than once.

First, and you might already know this, echo is a bad idea, one should use printf instead, if the goal is portability: "echo" has undefined behavior in POSIX if the argument it receives is "-n", and in practice some implementations of echo treat -n as a special option, while others just treat it as a normal argument to print. So that becomes this:

esceval()
{
    printf %s "$1" | sed "s/'/'\"'\"'/g"
}

Alternatively, instead of escaping embedded single quotes by making them into:

'"'"'

..instead you could turn them into:

'\''

..stylistic differences I guess (I imagine performance difference is negligible either way, though I've never tested). The resulting sed string looks like this:

esceval()
{
    printf %s "$1" | sed "s/'/'\\\\''/g"
}

(It's four backslashes because double quotes swallow two of them, and leaving two, and then sed swallows one, leaving just the one. Personally, I find this way more readable so that's what I'll use in the rest of the examples that involve it, but both should be equivalent.)

BUT, we still have a bug: command substitution will delete at least one (but in many shells ALL) of the trailing newlines from the command output (not all whitespace, just newlines specifically). So the above solution works unless you have newline(s) at the very end of an argument. Then you'll lose that/those newline(s). The fix is obviously simple: Add another character after the actual command value before outputting from your quote/esceval function. Incidentally, we already needed to do that anyway, because we needed to start and stop the escaped argument with single quotes. You have two alternatives:

esceval()
{
    printf '%s\n' "$1" | sed "s/'/'\\\\''/g; 1 s/^/'/; $ s/$/'/"
}

This will ensure the argument comes out already fully escaped, no need for adding more single quotes when building the final string. This is probably the closest thing you will get to a single, inline-able version. If you're okay with having a sed dependency, you can stop here.

If you're not okay with the sed dependency, but you're fine with assuming that your shell is actually POSIX-compliant (there are still some out there, notably the /bin/sh on Solaris 10 and below, which won't be able to do this next variant - but almost all shells you need to care about will do this just fine):

esceval()
{
    printf \'
    unescaped=$1
    while :
    do
        case $unescaped in
        *\'*)
            printf %s "${unescaped%%\'*}""'\''"
            unescaped=${unescaped#*\'}
            ;;
        *)
            printf %s "$unescaped"
            break
        esac
    done
    printf \'
}

You might notice seemingly redundant quoting here:

printf %s "${unescaped%%\'*}""'\''"

..this could be replaced with:

printf %s "${unescaped%%\'*}'\''"

The only reason I do the former, is because one upon a time there were Bourne shells which had bugs when substituting variables into quoted strings where the quote around the variable didn't exactly start and end where the variable substitution did. Hence it's a paranoid portability habit of mine. In practice, you can do the latter, and it won't be a problem.

If you don't want to clobber the variable unescaped in the rest of your shell environment, then you can wrap the entire contents of that function in a subshell, like so:

esceval()
{
  (
    printf \'
    unescaped=$1
    while :
    do
        case $unescaped in
        *\'*)
            printf %s "${unescaped%%\'*}""'\''"
            unescaped=${unescaped#*\'}
            ;;
        *)
            printf %s "$unescaped"
            break
        esac
    done
    printf \'
  )
}

"But wait", you say: "What I want to do this on MULTIPLE arguments in one command? And I want the output to still look kinda nice and legible for me as a user if I run it from the command line for whatever reason."

Never fear, I have you covered:

esceval()
{
    case $# in 0) return 0; esac
    while :
    do
        printf "'"
        printf %s "$1" | sed "s/'/'\\\\''/g"
        shift
        case $# in 0) break; esac
        printf "' "
    done
    printf "'\n"
}

..or the same thing, but with the shell-only version:

esceval()
{
  case $# in 0) return 0; esac
  (
    while :
    do
        printf "'"
        unescaped=$1
        while :
        do
            case $unescaped in
            *\'*)
                printf %s "${unescaped%%\'*}""'\''"
                unescaped=${unescaped#*\'}
                ;;
            *)
                printf %s "$unescaped"
                break
            esac
        done
        shift
        case $# in 0) break; esac
        printf "' "
    done
    printf "'\n"
  )
}

In those last four, you could collapse some of the outer printf statements and roll their single quotes up into another printf - I kept them separate because I feel it makes the logic more clear when you can see the starting and ending single-quotes on separate print statements.

P.S. There's also this monstrosity I made, which is a polyfill which will select between the previous two versions depending on if your shell seems to be capable of supporting the necessary variable substitution syntax (it looks awful though, because the shell-only version has to be inside an eval-ed string to keep the incompatible shells from barfing when they see it): https://github.com/mentalisttraceur/esceval/blob/master/sh/esceval.sh

mtraceur
  • 3,254
  • 24
  • 33
  • 1
    Great stuff, but you need `printf "'\\\''"` instead of `printf "'\''"` in the pure shell solution (the version on Github, `printf "'"'\''"'"`, breaks altogether). To make the `sed` solution _multi-line_-capable, you need to read _all_ lines up front: `esceval(){ printf '%s\n' "$1" | sed -e ':a' -e '$!{N;ba' -e '}' -e "s/'/'\\\\''/g; s/^/'/; s/$/'/"; }`. Quibble: It's generally [better not to use all-uppercase variable names](http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_01) so as to avoid clashes with environment variables and special shell variables. – mklement0 Apr 25 '15 at 15:58
  • @mklement0 Regarding not using uppercase variable names: while I think my usage was safe (between not being a common envar name and being used in a subshell only), my use of uppercase letters was driven by a misreading of the standard - I had taken it to mean that POSIX did not mandate support for lowercase variable names in general (including in the shell), but now upon re-reading, I see that it is merely guaranteeing that all of the defined utilities will use all-capital environment variables. I really appreciate your comment indirectly fixing that misunderstanding for me. – mtraceur Apr 26 '15 at 05:36
  • @mklement0 Regarding the sed string: You're absolutely right, I can't believe I forgot that. I knew there must've been a reason why I used separate printfs in my github version - but when I wrote the answer I hadn't been able to remember why. Wouldn't this also work as another alternative sed variant that supports multi-line arguments, though? `esceval() { printf %s "$1" | sed "1 s/^/'/; s/'/'\\\\''/g; $ s/$/'/"; }` I'll go ahead and edit my answer with a multi-line supporting single-sed-command variant as soon as I can confirm which of two two (your suggestion or mine) is more portable. – mtraceur Apr 26 '15 at 05:42
  • Re: The version on github with its `printf "'"'\''"'"`, as well as the `printf "'\''"` situation in the non-eval'ed-string examples in this answer. I just tested the examples and the github version again on: Busybox v1.23.0 ash, bash 2.05 (so old...), bash 4.3.33, and dash 0.5.7. It worked correctly in dash and ash, but worked wrongly in the two bash shells. I should've know to expect bash being bash as it so often is, and tested more thoroughly... I guess bash's builtin printf has inconsistent backslash escaping behavior to the ones in dash/busybox-ash (I trust the latter are more POSIX-y). – mtraceur Apr 26 '15 at 05:59
  • 1
    Thanks for your thoughtful feedback. Re your revised `sed` solution: While both yours and mine should be portable, yours is preferable, because it's simpler and doesn't read all lines at once. However, it requires a few tweaks: (a) the `1 ...` substitution must be placed AFTER the general substitution, so that the latter doesn't also replace the just-added, initial `'`; (b) as in my sed solution, `\n` must be appended to the `printf` command so as to ensure that trailing newlines are accurately preserved: `esceval() { printf '%s\n' "$1" | sed "s/'/'\\\\''/g; 1 s/^/'/; $ s/$/'/"; }` – mklement0 Apr 26 '15 at 14:02
  • 1
    Re `printf "'"'\''"'"`: it actually makes sense to me that that would break, because it contains `'\''`, which is an attempt to include a single quote inside a single-quoted string, which isn't supported in a POSIX shell: ["A single-quote cannot occur within single-quotes."](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html) – mklement0 Apr 26 '15 at 14:19
  • 1
    Re `printf "'\''"`: I think the problem here are variations in `printf` behavior, not _shell_ string-parsing: if you use `printf %s "'\''"`, all shells should behave the same again (verified in recent versions of `bash`, `dash`, `ksh`, `zsh`). `printf` is a _builtin_ most shells, and behavior differs with respect to processing the _format string_ (incidentally, the _utility_ form of `printf` also differs across platforms); by using `%s`, you eliminate these variations. Given that, you could even eliminate the separate `printf` statement and use `printf %s "${unescaped%%\'*}'\''"` instead. – mklement0 Apr 26 '15 at 17:09
  • As an aside: you may be interested in [`shall`](https://www.npmjs.com/package/shall), a cross-shell test tool I wrote. – mklement0 Apr 26 '15 at 17:11
  • 1
    @mklement0 Yes, it was definitely a printf handling-of-format-string incompatibility rather than bad Bourne shell syntax - anyway, I have fixed all of the examples in my post, and my github esceval.sh shim. Also after some contemplation and testing I have now understood why sed-only approach requires the extra newline: because sed interprets newlines as text delimiters, not as part of the text literal. So getting the closing quote after trailing input newlines only works if you "pad" it with one extra newline. I've edited my answer accordingly as well. (Yes, your "shall" tool is of interest.) – mtraceur Apr 28 '15 at 07:55
  • Cool; thanks for updating, and for an interesting conversation. – mklement0 Apr 28 '15 at 15:49
  • 1
    Infinite loop with argument `"it's"` on both `GNU bash, version 4.3.11(1)-release (x86_64-pc-linux-gnu)` and `zsh 5.0.5 (x86_64-pc-linux-gnu)` – Tom Hale Sep 03 '16 at 06:50
  • @TomHale Thanks for the bug report. Sorry for delayed reply. Will investigate soon and follow up. – mtraceur Sep 05 '16 at 06:58
  • @TomHale Bug confirmed in the last code snippet in this answer. The version up on my github did not have the bug, nor did the `sed`-based variant right before. A quick comparison reveals the problem: I accidentally omitted the `shift` command in the last code snippet in my answer. Fixing now. – mtraceur Sep 05 '16 at 07:29
  • @TomHale Fixed yesterday, btw - per my last comment. Since I presume you were the downvoter, I'd appreciate if you either reconsidered the downvote now that it's been fixed, or provided further feedback if you think it's still a poor answer. – mtraceur Sep 07 '16 at 06:02
  • 1
    @mklement0 I finally fixed the variable naming from uppercase to lowercase. I think I fixed it years ago in the `esceval` repo, but for some reason I never circled back to doing it for this answer. Nowadays I totally agree that she'll variables should never be all-uppercase unless they are meant to be environment variables, since in Bourne-like shell setting a variable could clobber the value of an environment variable of the same name in all child processes of that shell. – mtraceur Feb 01 '22 at 18:56
3

I think this is POSIX. It works by clearing $@ after expanding it for the for loop, but only once so that we can iteratively build it back up (in reverse) using set.

flag=0
for i in "$@"; do
    [ "$flag" -eq 0 ] && shift $#
    set -- "$i" "$@"
    flag=1
done

echo "$@"   # To see that "$@" has indeed been reversed
ls "$@"

I realize reversing the arguments was just an example, but you may be able to use this trick of set -- "$arg" "$@" or set -- "$@" "$arg" in other situations.

And yes, I realize I may have just reimplemented (poorly) ormaaj's Push.

chepner
  • 497,756
  • 71
  • 530
  • 681
  • 1
    Interesting, but probably too specific to the reversing example I happened to choose. More typically I would want to collect multiple argument lists, processing options, etc. – Jesse Glick Aug 28 '12 at 15:36
  • To my recent surprise, `(( expr ))` is not POSIX, although it is widely supported. If you use the construct, all variables are automatically interpolated (no `$` needed), which is nice for more complex expressions. – Henk Langeveld Aug 28 '12 at 16:29
  • I skimmed the spec too quickly; I thought I saw it there. I'll replace with something POSIX, but anyone reading this should feel free to edit my answer with a better test. – chepner Aug 28 '12 at 16:36
1

Push. See the readme for examples.

ormaaj
  • 6,201
  • 29
  • 34
1

The following seems to work with everything I have thrown at it so far, including spaces, both kinds of quotes and a variety of other metacharacters, and embedded newlines:

#!/bin/sh
quote() {
    echo "$1" | sed "s/'/'\"'\"'/g"
}
args=
for arg in "$@"
do
    argq="'"`quote "$arg"`"'"
    args="$argq $args"
done
eval "ls $args"
Jesse Glick
  • 24,539
  • 10
  • 90
  • 112
  • This does not handle an argument containing a newline. In fact it is very dangerous because the text in the argument after the newline will be executed as another shell command. Don't run this on your web server! – mark4o Aug 28 '12 at 16:09
  • If you give it an argument `'-n'`, it will be turned into `''` when quoting, because `echo` will parse it as an option. Better change `echo "$1"` into `printf "%s" "$1"`. (dash, Linux) – joeytwiddle Dec 23 '15 at 07:20
  • $ `eval echo quote "it's"` `-bash: unexpected EOF while looking for matching \`''` `-bash: syntax error: unexpected end of file` – Tom Hale Sep 03 '16 at 07:44
1

If you're okay with calling out to an external executable (as in the sed solutions given in other answers), then you may as well call out to /usr/bin/printf. While it's true that the POSIX shell built-in printf doesn't support %q, the printf binary from Coreutils sure does (since release 8.25).

esceval() {
    /usr/bin/printf '%q ' "$@"
}
Matt Whitlock
  • 756
  • 7
  • 10
0

We can use /usr/bin/printf when version of GNU Coreutil is not less than 8.25

#!/bin/sh

minversion="8.25"
gnuversion=$(ls '--version' | sed '1q' | awk 'NF{print $NF}')

printcmd="printf"

if ! [ $gnuversion \< $minversion ]; then
    printcmd="/usr/bin/printf"
fi;

params=$($printcmd "%q" "$@")
Hoang TO
  • 61
  • 4