1

Is there a standard shell command to convert a binary sequence containing a mix of ASCII and non-ASCII characters into an all-ASCII sequence, that keeps all printable non-whitespace ASCII characters intact and changes all the others (non-ASCII + whitespace) characters into x-notation symbols understandable by echo -e?

For example, let's say I have a string ʃBC\n - note, that the first symbol is a Latin letter "esh" and the last symbol is a newline, second and third are ASCII symbols B and C. In UTF-8 this string encodes to ca 83 42 43 0a bytes. The command I'm looking for needs to change original string to \xca\x83BC\x0a - so that I can print the original string via echo -ne "\xca\x83BC\x0a", assuming UTF-8 encoding is used.

Cyrus
  • 84,225
  • 14
  • 89
  • 153
Alex
  • 2,916
  • 3
  • 22
  • 27
  • 3
    No, but you should not use `echo -e` anyway. `printf '%q'` produces output which is similar to what you describe, which is suitable for feeding back to `printf`. – tripleee Feb 16 '21 at 06:04
  • @tripleee Not quite; the `%q` format produces something suitable for parsing by bash as part of a command line. That generally doesn't include processing multibyte unicode sequences into escaped hex, but if the string includes control characters (like newline) it'll generally render it as a properly quoted ANSI-C-type string, like `$'ʃBC\n'` (i.e. the "$" and single-quotes are part of the output). But for this particular string, older versions (tested in 4.2.10 and 3.2.57) seem to convert the second byte of "ʃ" into an escaped octal code but leave the first byte alone. – Gordon Davisson Feb 16 '21 at 07:06
  • 1
    Do you need something *specifically* for `echo`, or would some other ASCII encoding like `base64` work? – l0b0 Feb 16 '21 at 08:36
  • @l0b0 yes. Part of the reason is to be able to edit ASCII symbols by hand. Full context: I have a bunch of small binary file with ASCII strings in there, these couldn't be opened in editor as is. I want to be able to transform these files to ASCII, edit strings and save back - via `echo -e`. – Alex Feb 16 '21 at 21:04
  • @tripleee Not sure if I'm using it wrong, but the following doesn't work for me (Mac OS X 10.12): `printf '%q' $"ʃasd\n"` yields `$'?\203asd\\n'` and: `$ printf $'?\203asd\\n'` yields `??asd` – Alex Feb 16 '21 at 21:05
  • @tripleee by the way, what's wrong with `echo -e`? Or is it just an absence of readily available binary transformation that I'm looking for? – Alex Feb 16 '21 at 21:08
  • 1
    @Alex That's the bug I mentioned on older versions of bash. As for `echo -e`, it's generally considered untrustworthy and inconsistent (between versions, runtime and compile-time options, etc). Plus, if you enter the string for `echo -e` on the command line, it goes through two levels of parsing/processing: first shell quote&escape parsing, then `echo`'s `-e` escape parsing. This can get messy. – Gordon Davisson Feb 17 '21 at 00:22
  • 2
    https://unix.stackexchange.com/questions/65803/why-is-printf-better-than-echo – tripleee Feb 17 '21 at 05:13

1 Answers1

3

Can this achieve what you wanted ?

#!/usr/bin/env bash
  
python -c 'import sys;print(str(sys.argv[1].encode("utf-8"))[2:-1])' "$1"

Calling with :

$ test.sh $'ʃBC\n'
\xca\x83BC\n

This requires python version 3.

Philippe
  • 20,025
  • 2
  • 23
  • 32
  • There's a slight caveat that `str()` here will switch between `'` and `"` quotes depending on what's inside, and doesn't necessarily escape them. So an input like `don't` will give a literal single quote in the output -- and similarly for double quotes. – ilkkachu Feb 16 '21 at 14:19
  • @ilkkachu, give an example where `"` or `'` are involved and 'echo -en' gives a different result. – Philippe Feb 16 '21 at 14:39
  • the Q said they wanted to print the original string back with `echo -ne "\xca\x83BC\x0a"`. That looks like they're running it through the shell. Now, if the output contains double quotes, like in the output you get for the input `he said "hi"`, that command would be `echo "he said "hi""` which outputs `he said hi`, losing the quotes. – ilkkachu Feb 16 '21 at 14:57
  • Shell expansions would be another problem, since `str()` doesn't escape the `$`. – ilkkachu Feb 16 '21 at 14:58
  • There should be no issues : `echo "$(bash test.sh 'he said "hi"')"` gives `he said "hi"` – Philippe Feb 16 '21 at 16:29
  • Depends on how it's done, with shell substitutions or by building the command line externally. The question doesn't show, so it's good to know what the limitations are. – ilkkachu Feb 16 '21 at 17:30
  • I've got the idea, but for some reason it doesn't work for me (Mac OS X 10.12, standard terminal): $ python -c 'import sys;print(str(sys.argv[1].encode("utf-8"))[2:-1])' $'BCBCBC' yields BCB and $ python -c 'import sys;print(str(sys.argv[1].encode("utf-8"))[2:-1])' $'ʃBC\n' yields: Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'ascii' codec can't decode byte 0xca in position 0: ordinal not in range(128) – Alex Feb 16 '21 at 20:55
  • Also I was hoping that one of the existing utilities somehow supports what I'm looking for - e.g. od, xxd or hexdump. No problems with python solution though. – Alex Feb 16 '21 at 20:58
  • @Alex Strange python -c 'import sys;print(str(sys.argv[1].encode("utf-8"))[2:-1])' $'BCBCBC' gave me BCBCBC. – Philippe Feb 16 '21 at 21:33
  • @Alex Do you have python3 installed on your MacOS ? If yes, can you try python3 ... ? – Philippe Feb 16 '21 at 22:19
  • You're likely right, I have python 2, unfortunately my version of Mac OS X is a bit dated. – Alex Feb 17 '21 at 06:24
  • It'd be worth calling out that python 3 is needed to run the above reliably. – Alex Feb 17 '21 at 06:29