0

I'm trying to use a shell script to generate C-code for wrapping executables.

This needs to work on Linux and MacOS, and have as few dependencies as possible. I don't care about Windows (other than WSL2)

#include <unistd.h>
#include <stdlib.h>

int main(int argc, char **argv) {
    putenv("X=1");
    putenv("HELLO=WORLD")
    argv[0] = "/usr/bin/python3";
    return execv("/usr/bin/python3", argv);
}

Naive approach:

# make-c-wrapper.sh EXECUTABLE ARGS
#
# ARGS:
# --argv0       NAME    : set name of executed process to NAME
#                         (defaults to EXECUTABLE)
# --set         VAR VAL : add VAR with value VAL to the executable’s
#                         environment

echo "#include <unistd.h>\n#include <stdlib.h>\n\nint main(int argc, char **argv) {"
executable="$1"
params=("$@")

for ((n = 1; n < ${#params[*]}; n += 1)); do
    p="${params[$n]}"
    if [[ "$p" == "--set" ]]; then
        key="${params[$((n + 1))]}"
        value="${params[$((n + 2))]}"
        n=$((n + 2))
        echo "    putenv(\"$key=$value\");"
    elif [[ "$p" == "--argv0" ]]; then
        argv0="${params[$((n + 1))]}"
        n=$((n + 1))
    else
        # Using an error macro, we will make sure the compiler gives an understandable error message
        echo "    #error make-c-wrapper.sh did not understand argument $p"
    fi
done

echo "    argv[0] = \"${argv0:-$executable}\";\n    return execv(\"$executable\", argv);\n}"

But this fails if you try to supply special characters in the input:

./make-c-wrapper /usr/bin/python3 --set "Hello" "This is\"\na test"
#include <unistd.h>
#include <stdlib.h>

int main(int argc, char **argv) {
    putenv("Hello=This is"
a test");
    argv[0] = "/usr/bin/python3";
    return execv("/usr/bin/python3", argv);
}

What I would have liked to see here is this:

#include <unistd.h>
#include <stdlib.h>

int main(int argc, char **argv) {
    putenv("Hello=This is\"\na test");
    argv[0] = "/usr/bin/python3";
    return execv("/usr/bin/python3", argv);
}

According to this answer: https://stackoverflow.com/a/12208808/8008396, it seems like I need to escape the following characters to make sure the result is a valid C string literal: ", \, \r, \n, \0 and \?.

Is there an easy way to do this? And it needs to work on MacOS, not just Linux.

Tobias Bergkvist
  • 1,751
  • 16
  • 20
  • have you tried escaping the escape character `'\'`? Something like `--set "Hello" "This is\\\"\\na test"` Although, I don't know if this will work the same for bash parsing: https://godbolt.org/z/9YqnceGeT – yano May 26 '21 at 18:07
  • Ideally I want the shell script itself to deal with escaping things from the input - so that the user of the script doesn't have to think about it – Tobias Bergkvist May 26 '21 at 18:11

1 Answers1

1

it seems like I need to escape the following characters to make sure the result is a valid C string literal: ", \, \r, \n, \0 and \?.

You need to escape ", \, and newline. While you're at it, it makes sense to escape the carriage return. Although there is an escape sequence for ?, that character can also represent itself. Null characters in your input are not representable as elements of a string literal, and your shell probably doesn't handle them in variable values, either, so you would probably be best off not giving them any special consideration.

Shell parameter expansion syntax has a substring replacement feature and a C-like literal syntax that you could leverage. The shell quoting gets a little involved, but for example, this ...

escape_string_literal() {
    result=${1//'\'/'\\'}
    result=${result//\"/'\"'}
    result=${result//$'\n'/'\n'}
    result=${result//$'\r'/'\r'}
}

escape_string_literal '"boo\"'
echo "${result}"

... prints

\"boo\\\"

Do note, however, that you're not necessarily clear to include all other characters in your string literals. In particular, even though there are no single-character escapes for most of them, other control characters might or might not be accepted as literal characters, depending on your C implementation.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • Seems like this prints `\"boo\\\"` in bash, but `\"boo\\"` in zsh (which is the default shell on MacOS). Any idea why? – Tobias Bergkvist May 26 '21 at 19:10
  • Seems like printf might be more consistent across different shells. This produced the same results in both zsh and bash: `printf "%s\n" "${result}"` – Tobias Bergkvist May 26 '21 at 19:18
  • I can't seem to get it working for this though `escape_string_literal "boo\nabc\"\\xyz"; printf "%s\n" $result`. I get `boo\\nabc\"\\xyz` but would have expected to get `boo\nabc\"\\xyz` – Tobias Bergkvist May 26 '21 at 19:36
  • Seems like this is because `$'\n'` and `'\n'` are not the same. Even though when you echo them out, they will look exactly the same. `'\n'` is actually equivalent to`$'\\n'` – Tobias Bergkvist May 26 '21 at 20:40
  • 1
    @TobiasBergkvist, even though `zsh` is the default shell on recent MacOS, I'm pretty sure that `bash` is still provided there. I would recommend choosing a shell explicitly (add a shebang line specifying it to the script). That will reduce the surface area for divergent behavior. – John Bollinger May 26 '21 at 22:32
  • Yeah, I noticed that creating polyglot-code between bash and zsh is not exactly straight forward. I gave up on that after realising that bash starts counting at 0, while zsh starts counting at 1 (when it comes to array indices) – Tobias Bergkvist May 26 '21 at 22:41
  • 1
    @TobiasBergkvist, `boo\\nabc\"\\xyz` is the correct result in that case. I suspect you're being thrown off by the details of the shell's quoting behavior, and perhaps by the fact that shell single- and double-quoted strings do not recognize C-style escapes. That is, in fact, precisely what makes `$'\n'` different from `'\n'`. The former style recognizes (many) C-style escape sequences, whereas the latter does not recognize any escapes sequences at all. To the shell, `'\n'` represents a two-character string. – John Bollinger May 26 '21 at 22:41