How Python decodes UTF8 Encoding in String Format

Question

Now there is a string of utf-8：

s = '\\346\\235\\216\\346\\265\\267\\347\\216\\211'

I need to decode it, but now I only do it in this way：

result = eval(bytes(f"b'{s}'", encoding="utf8")).decode('utf-8')

This is not safe, so is there a better way?

A super round-about way: `s.encode('latin-1').decode("unicode_escape").encode('latin-1').decode('utf-8')` see the linked duplicate. Hoenstly, if safety is your concern, you can just use `ast.literal_eval`, it is almost clear-er for me. `unicode-escape` encoding is a bit arcane — juanpa.arrivillaga, Aug 12 '21 at 04:32
`ast.literal_eval("b'"+s+"'").decode('utf8')` -> `'李海玉'`. Shorter but not necessarily clearer. — Mark Tolonen, Aug 16 '21 at 16:56

score 1 · Accepted Answer · answered Aug 12 '21 at 04:26

1

Use ast.literal_eval(), it's not unsafe.

Then you don't need to call bytes(), since it will return a byte string.

result = ast.literal_eval(f"b'{s}'").decode('utf-8')

answered Aug 12 '21 at 04:26

Barmar

Thank you very much！I've tried literal before_ Eval, but an error is reported due to bytes() – yternal Aug 12 '21 at 04:35
This answer would break if `s` contains a single quote, for example. The linked answer is a safer approach IMHO, without involving parsing of Python syntax. – blhsing Aug 12 '21 at 04:47

score 0 · Answer 2 · edited Aug 12 '21 at 06:18

0

Might be what you are hoping to get ... :

'\\346\\235\\216\\346\\265\\267\\347\\216\\211'.encode('utf8').decode('unicode-escape')

edited Aug 12 '21 at 06:18

imxitiz

answered Aug 12 '21 at 05:40

Joran Beasley

score -1 · Answer 3 · answered Aug 12 '21 at 04:24

-1

you can do decoded_string = s.decode("utf8")

answered Aug 12 '21 at 04:24

PWR KANI1447

2

`decode()` is a bytes method, not a string method. – Barmar Aug 12 '21 at 04:25

3 Answers3