-1

I need to convert a string in chinese to its appropriate HEX format. I can do it using sed in the following way

echo -n 欢迎 | xxd -p -u | sed 's/.\{2\}/&\\x/g' | sed 's/^\(.\{0\}\)/\1\\x/' | sed -r 's/(.*)\\x/\1 /'

which gives me output as:

\xE6\xAC\xA2\xE8\xBF\x8E

This is correct answer that I am looking for. Please suggest me making using of sed more efficiently in above command. The above command is being run on ubuntu 16.04 terminal

Pavan
  • 19
  • 2
  • 2
    if you want to be efficient, why don't use a tool designed exactly for that like `od`? – phuclv Feb 05 '18 at 04:48
  • 2
    [The Perl snippet I posted in your previous question](https://stackoverflow.com/questions/48537793/convert-unicoded-string-to-corresponding-string-in-c/48615953#comment84070882_48537793) should be way more efficient than a pipeline of multiple processes. See also [combining 2 sed commands](https://stackoverflow.com/questions/7657647/combining-2-sed-commands) – tripleee Feb 05 '18 at 04:51
  • 2
    If your host language is C, it's not hard to do this conversion in native C, either. – tripleee Feb 05 '18 at 04:52
  • Process substitution is available in bash `sed 's/.\{2\}/&\\x/g;s/^\(.\{0\}\)/\1\\x/;s/\(.*\)\\x/\1 /' < <(echo -n 欢迎 | xxd -p -u)` – David C. Rankin Feb 05 '18 at 04:55
  • This is definitely an [XY problem](https://meta.stackexchange.com/q/66377/230282). In C the simplest way to do this is simply print each character with `printf("\\x%02X", byte[i])`. If you want to read like in your other question, the reverse can be done with scanf – phuclv Feb 05 '18 at 05:04
  • @tripleee My host language is C. Can you let me know how to do this in C? – Pavan Feb 05 '18 at 05:04
  • Another using bash parameter expansions and `od` would be `a=$(echo -n 欢迎 | od -A none -t x1); a=${a^^}; a=${a// /\\x}; echo $a` – David C. Rankin Feb 05 '18 at 05:11
  • There's a *lot* of questions about this but it's hard to find one which does *exactly* what you are asking, perhaps because I'm not a C programmer. This one looks fairly close: https://stackoverflow.com/questions/7369344/how-to-unescape-strings-in-c-c – tripleee Feb 05 '18 at 05:55

2 Answers2

0

You can chain sed-commands with ";":

 echo -n 欢迎 | xxd -p -u | sed 's/.\{2\}/&\\x/g;s/^\(.\{0\}\)/\1\\x/' | sed -r 's/(.*)\\x/\1 /'
\xE6\xAC\xA2\xE8\xBF\x8E 

Since you use sed and sed -r interchangingly, you have to modify the second, remaining sed call, to combine the remaining ones:

echo -n 欢迎 | xxd -p -u | sed 's/.\{2\}/&\\x/g;s/^\(.\{0\}\)/\1\\x/;s/\(.*\)\\x/\1 /'

Having a second look at it, what the output of xxd is without sed, I observed, the solution is much more easy:

echo -n 欢迎 | xxd -p -u | sed -r 's/(..)/\\x\1/g'

Your initial approach appended \x to 2 characters, but you can preceed it your pairs. However chaining multiple sed commands might still be a useful thing to know.

user unknown
  • 35,537
  • 11
  • 75
  • 121
0

From an efficiency standpoint, about the best option I could come up with would be to replace xdd, 3-pipes, and 3 calls to sed with od and 2 bash parameter expansions. (there may be more efficient ways, but this was what came to mind)

For example, you could assign the result of command substitution $(printf "欢迎" | od -A none -t x1) to a variable which would contain ' e6 ac a2 e8 bf 8e'. Then it is simply a matter of converting to upper-case and then using a substring replacement of 'space' to '\x' (both provided by bash parameter expansions, e.g.

a=$(printf "欢迎" | od -A none -t x1); \
a=${a^^}; \
a=${a// /\\x}; \
echo $a
\xE6\xAC\xA2\xE8\xBF\x8E

(shown with line-continuations above, you can just copy/paste into your terminal to test)

From Your Request in Comment for C

The code in C to output the upper-case hex bytes contained in your string is trivial, e.g.

#include <stdio.h>

int main (void) {

    char *s = "欢迎";

    while (*s)  /* output each byte in upper-case hex */
        printf ("\\x%hhX", ((unsigned char)*s++));
    putchar ('\n');

    return 0;
}

Example Use/Output

$ ./bin/str2hexbytes
\xE6\xAC\xA2\xE8\xBF\x8E

(note: you could use the exact-width types in stdint.h and the exact-width format specifiers provided in inttypes.h for a more formal solution, but it would accomplish the same thing. Similarly, you could use wide-character types, but virtually all modern compilers have no problem handling multibyte characters in an ordinary string or array of char)

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85