0

Now there is a string of utf-8:

s = '\\346\\235\\216\\346\\265\\267\\347\\216\\211'

I need to decode it, but now I only do it in this way:

result = eval(bytes(f"b'{s}'", encoding="utf8")).decode('utf-8')

This is not safe, so is there a better way?

yternal
  • 3
  • 1
  • Use `ast.literal_eval()` instead of the unsafe `eval()` – Barmar Aug 12 '21 at 04:24
  • A super round-about way: `s.encode('latin-1').decode("unicode_escape").encode('latin-1').decode('utf-8')` see the linked duplicate. Hoenstly, if safety is your concern, you can just use `ast.literal_eval`, it is almost clear-er for me. `unicode-escape` encoding is a bit arcane – juanpa.arrivillaga Aug 12 '21 at 04:32
  • `ast.literal_eval("b'"+s+"'").decode('utf8')` -> `'李海玉'`. Shorter but not necessarily clearer. – Mark Tolonen Aug 16 '21 at 16:56

3 Answers3

1

Use ast.literal_eval(), it's not unsafe.

Then you don't need to call bytes(), since it will return a byte string.

result = ast.literal_eval(f"b'{s}'").decode('utf-8')
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • Thank you very much!I've tried literal before_ Eval, but an error is reported due to bytes() – yternal Aug 12 '21 at 04:35
  • This answer would break if `s` contains a single quote, for example. The linked answer is a safer approach IMHO, without involving parsing of Python syntax. – blhsing Aug 12 '21 at 04:47
0

Might be what you are hoping to get ... :

'\\346\\235\\216\\346\\265\\267\\347\\216\\211'.encode('utf8').decode('unicode-escape')
imxitiz
  • 3,920
  • 3
  • 9
  • 33
Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
-1

you can do decoded_string = s.decode("utf8")