Python 3 - decode escaped string

Question

I am working on simple python script. Unluckily some data I must work with, are stored as follows:

My data

trouble_string = '{\"N\": \"Centr\\u00e1lna nervov\\u00e1 s\\u00fastava\"}'

What I want to achieve

I want to convert string in following format.

decoded_string = '{"N": "Centrálna nervová sústava"}'

Problem

You can see there are accented letters numerically encoded. Is there any smart way how to decode this string?

What I tried

bytes(s, encoding='utf-8').decode(encoding='utf-16')
# outputs: '䌢湥牴畜〰ㅥ湬\u2061敮癲癯畜〰ㅥ猠畜〰慦瑳癡≡'

bytes(s, encoding='utf-16').decode(encoding='utf-8')
# outputs: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

The assumption that anything here is UTF-16 is false. This is just regular backslash escaping. — tripleee, Sep 25 '19 at 14:54
@tripleee Looks like utf-8 and utf-16 have different way of escaping - Check out https://convertcodes.com/unicode-converter-encode-decode-utf/. — Fusion, Sep 25 '19 at 15:27
No, the page you are linking to is confused. Neither UTF-8 nor UTF-16 has any backslash escaping mechanism. The notation `\u1234` is simply Python's way of representing a Unicode character (and at this point it has no encoding at all, neither UTF-8 nor UTF-16). JSON is by definition UTF-8 but uses precisely this representation, too, so my vote is on deceze's answer. — tripleee, Sep 25 '19 at 15:29
Maybe see also https://stackoverflow.com/questions/32499846/is-utf-16-compatible-with-utf-8 — tripleee, Sep 26 '19 at 04:11

score 3 · Answer 1 · answered Sep 25 '19 at 14:56

3

It looks like JSON, so decode it and the encode it in a way that you prefer?

>>> import json
>>> json.loads('{\"N\": \"Centr\\u00e1lna nervov\\u00e1 s\\u00fastava\"}')
{'N': 'Centrálna nervová sústava'}
>>> json.dumps(json.loads('{\"N\": \"Centr\\u00e1lna nervov\\u00e1 s\\u00fastava\"}'), ensure_ascii=False)
'{"N": "Centrálna nervová sústava"}'

answered Sep 25 '19 at 14:56

deceze

510,633
85
743
889

Import module, serialize json and dump it back to string seems a bit overkill. – Olvin Roght Sep 25 '19 at 15:16
2

If this is intended to be JSON encoding, then it's the correct thing to do. – deceze Sep 25 '19 at 15:21

score -1 · Accepted Answer · answered Sep 25 '19 at 14:30

-1

trouble_string = '{\"N\": \"Centr\\u00e1lna nervov\\u00e1 s\\u00fastava\"}'
result = trouble_string.encode().decode("unicode-escape")

Quote from docs:

unicode_escape - Produce a string that is suitable as Unicode literal in Python source code.

answered Sep 25 '19 at 14:30

Olvin Roght

7,677
2
16
35

Your `trouble_string` doesn't contain all the backslashes you put there, though; and the code you posted doesn't actually produce Unicode literals in the result. – tripleee Sep 25 '19 at 15:04
@tripleee, I've copied it from question, so it contains right what it should contain. – Olvin Roght Sep 25 '19 at 15:10

Python 3 - decode escaped string

My data

What I want to achieve

Problem

What I tried

2 Answers2