how combine 'utf-8' and 'unicode_escape' to correctly decode b'\xc3\xa4\\n-\\t-\\"foo\\"'?

Question

I have a library that gives me encoded and escaped byte sequences like this one:

a=b'\xc3\xa4\\n-\\t-\\"foo\\"'

Which I want to translate back to:

ä
-   -"foo"

I tried to just .decode a which decodes the sequence as wanted:

>>> a.decode()
'ä\\n-\\t-\\"foo\\"'

But it does not un-escape. Then I found 'unicode_escape' and I got

>>> print(a.decode('unicode_escape'))
Ã¤
-   -"foo"

Is there a way to decode and unescape the given sequence with a builtin method (i.e. without having to .replace('\\n', '\n').replace(...))?

It would be also interesting to know how I can revert this operation (i.e. getting the same byte sequence from the translated result).

Sorry - your question is not understandable after your "edit" - you provided an answer, and them changed the problem you have. One can't know what you want. — jsbueno, Dec 15 '16 at 13:10
Better now? I intended to provide a *possible* solution which shows what I try to achieve. It was not the answer I was hoping to find. — frans, Dec 15 '16 at 13:18

score 1 · Accepted Answer · answered Dec 15 '16 at 13:14

1

There is a way to somehow do what I want and I can almost go the other way, too but in my eyes it's ugly and incomplete, so I hope it's not the best option I have:

>>> import codecs
>>> decoded = codecs.escape_decode(a)[0].decode()
>>> print(decoded)
ä
-   -"foo"
>>> reencoded = codecs.escape_encode(decoded.encode())
>>> print(reencoded)
(b'\\xc3\\xa4\\n-\\t-"foo"', 11)      <--- qotes are note escaped

answered Dec 15 '16 at 13:14

frans

8,868
11
58
132

This is the only way I have found to make this work, yet it is actually an undocumented private internal function: https://bugs.python.org/issue30588 – Hack5 Oct 26 '19 at 19:05

how combine 'utf-8' and 'unicode_escape' to correctly decode b'\xc3\xa4\\n-\\t-\\"foo\\"'?

1 Answers1

Linked