Python3 - reinterpret string as bytes

Question

I have a string in python with some literal bytes and some literal ascii characters, e.g., print(s) provides:

The message - \xe3\x83\xa9\xe3\x82\xa4 ... that was the message

Is there an easy way to reinterpret this string as bytes in python, and then decode to utf-16? Or do I have to manually search for and separate out the unicode characters myself?

If I could declare the string as a literal, I would be fine, e.g.,

b"The message - \xe3\x83\xa9\xe3\x82\xa4 ... that was the message".decode('utf-8')

but unfortunately I have a string variable.

You're example for the string literal does not seem to work on python 3.6 — vidstige, May 12 '18 at 05:07
@user202729 - Not at all, if you use bytes(str) you have to specify the encoding which will convert the literal bytes characters to bytes as opposed to their unicode equivalent. The thing that makes this different as we are dealing with a string including characters that describe bytes, as opposed the the bytes themselves. — Matt, May 12 '18 at 05:09
@vidstige - my apologies, I truncated the bytes incorrectly - I've updated the question. — Matt, May 12 '18 at 05:12
The accepted answer there also say that you can use an iterable, and it's possible to convert a string to an iterable. — user202729, May 12 '18 at 05:13
@user202729 You're still stuck with literal bytes as opposed to unicode characters, regardless of the encoding. The bytes in my string are literally the characters, e.g., "\" and "x" and "3" for the first byte shown there. You can iterate over it but you'll have to detect the "\x" sequence, find the end of the byte and convert each one individually. There must be a better way. — Matt, May 12 '18 at 05:15
Can you explain why a variable won't work, but the literal work? Can you give the error message? — vidstige, May 12 '18 at 05:15
Just to be clear, you mean to say `print(my_string)` give what you are showing? — juanpa.arrivillaga, May 12 '18 at 05:16
[That works for me](https://tio.run/##pc6xboNADAbg/Z7iVzoAUjkpTYdk4C06shhw4KTkDp1NAk9PTyjp0CFLBku/LP@fPC46BH9YV1FU2P0MjCuLUM8oUc98qOdjGjo98lfK37DWQgdS3ElS@OvsjOF55Fa5S1rzNmc7bkPHeTbpuTxmhTEfmIRBkMkpNRcG@3ThfG/G6LzmonbbpM6F1PlynxX/FVQVnm9uZBv8jaNCA5xy3NxwRvK45ygPulmUKUZa8iuNeYjdp2jxGl/XXw). No idea what your problem is. You're not clear enough. — user202729, May 12 '18 at 05:18
@vidstige - There is no error message, I'm just stuck with a string with ascii characters representing bytes as opposed to a unicode string. I've just worked out a solution, posting it now. — Matt, May 12 '18 at 05:19
Then probably [this](https://stackoverflow.com/questions/1885181/how-do-i-un-escape-a-backslash-escaped-string-in-python). — user202729, May 12 '18 at 05:19
@juanpa.arrivillaga yes, without the quotation marks, but for the first example, yes. I'll update the question. — Matt, May 12 '18 at 05:20
@juanpa.arrivillaga The lowest voted answer there does. As the Python3 question is a subset of the Python question shoudl they be merged? — user202729, May 12 '18 at 05:27
So, the solution in Python 3 is: `import codecs; print(codecs.escape_decode(s)[0].decode())` — juanpa.arrivillaga, May 12 '18 at 05:30
Given that it `codecs.escape_decode` [is undocumented](https://bugs.python.org/issue25270) and there's no reason to believe it couldn't be removed, this may be a place where the use of `eval`/`ast.literal_eval` would make sense — juanpa.arrivillaga, May 12 '18 at 05:33
Thanks @juanpa.arrivillaga I was concerned about eval because of the security issues, but the codecs library works. — Matt, May 12 '18 at 05:37
@Matt `import ast; ast.literal_eval` should be safe. Again, the docs in CPython mention it might be removed, and it is undocumented. So, if this is a one-time thing then go ahead, but I would be wary of putting it into production — juanpa.arrivillaga, May 12 '18 at 05:39

Python3 - reinterpret string as bytes

0 Answers0