There's several shell-specific ways to include a ‘unicode literal’ in a string. For instance, in Bash, the quoted string-expanding mechanism, $''
, allows us to directly embed an invisible character: $'\u2620'
.
However, if you're trying to write universally cross-platform shell-scripts (generally, this can be truncated to “runs in Bash, Zsh, and Dash.”), that's not a portable feature.
I can portably achieve anything in the ASCII table (octal number-space) with a construct like the following:
WHAT_A_CHARACTER="$(printf '\036')"
… however, POSIX / Dash printf
only supports octal escapes.
I can also obviously achieve the full Unicode space by farming the task out to a fuller programming environment:
OH_CAPTAIN_MY_CAPTAIN="$(ruby -e 'print "\u2388"')"
TAKE_ME_OUT_TONIGHT="$(node -e 'console.log("\u266C")')"
So: what's the best way to encode such a character into a shell-script, that:
- Works in
dash
,bash
, andzsh
, - shows the hexadecimal encoding of the codepoint in the code,
- isn't dependant on the particular encoding of the string (i.e. not by encoding UTF-8 bytes in octal)
- and finally, doesn't require the invocation of any “heavy” interpreter. (let's say, less than 0.01s runtime.)