2

I have a variable like this:

>>> s = '\\320\\227\\320\\264\\320\\260\\320\\275\\320\\270\\320\\265 \\320\\261\\321\\213\\320\\262\\321\\210\\320\\265\\320\\271'
>>> print(s)
\320\227\320\264\320\260\320\275\320\270\320\265 \320\261\321\213\320\262\321\210\320\265\320\271

This contains the octal escape representations of the UTF-8 encoding of the string "Зданиебывшей" (octal 320 227 = hex D0 97 = UTF-8 for "З"). How can I decode this string to "Зданиебывшей"?

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Dhamo R
  • 100
  • 1
  • 11
  • my question is different.. @deceze.. i cant write b'\320\227\320\264\320\260\320\275\320\270\320\265 \320\261\321\213\320\262\321\210\320\265\320\271' cuz i get the octal values as a string object dynamically – Dhamo R May 31 '18 at 09:52
  • https://stackoverflow.com/a/23173435/476…? No? What is the expected result? – deceze May 31 '18 at 09:54
  • the octal values are in a STRING object. i cant decode a string object without converting it into a byte object right? so if I convert a string object to bytes, the octal content changes. I have to convert the values in a string variable(which are already octal) to byte object without changing the octal values so that i can decode it – Dhamo R May 31 '18 at 09:57
  • True. Then you're probably looking for https://stackoverflow.com/a/24519338/476. – deceze May 31 '18 at 10:00
  • ;-; this one is entirely different. I think you dont get my question. I just want to convert a string object (contains octal values) into a byte object. example: str = "\320\320\320" i have to make this into a byte object like this byte_str = b'\320\320\320' – Dhamo R May 31 '18 at 10:02
  • @DhamoR how did you get the string? – matt May 31 '18 at 10:04
  • How's it different exactly? From one of the answers there: `bytes('\\320\\227\\320', 'utf-8').decode('unicode_escape')` → 'Ð\x97Ð' – What result do you expect, why doesn't this technique work for you? – deceze May 31 '18 at 10:04
  • If you say `print` gives *\320\227\320*, then the correct literal for your string would be `"\\320\\227\\320"`, correct? – deceze May 31 '18 at 10:06
  • it actually represents Зданиебывшей in octal. you can check it here http://www.unit-conversion.info/texttools/octal/ – Dhamo R May 31 '18 at 10:06
  • So that is a string containing the octal representation of the UTF-8 encoding of the string Зданиебывшей…? You will have to clearly state that in your question. – deceze May 31 '18 at 10:11
  • b = bytes([int(i, 8) for i in str.split("\\")[1:]]) – matt May 31 '18 at 10:15
  • i guess i gave that in the explanation stating that the 'hello' value is a string object – Dhamo R May 31 '18 at 10:16
  • @matt nope.. it gives empty byte array – Dhamo R May 31 '18 at 10:19
  • You should include the assignment of the string, to show what the actual string is. And the value you expect. – matt May 31 '18 at 10:19
  • sorry that I didnt explain the question properly. So i think now it is clear. @deceze can you pls remove the duplicate markers as this question is not duplicate? – Dhamo R May 31 '18 at 10:21
  • @DhamoR it works for me. `s = "\\320\\227"` `bytes([int(i, 8) for i in s.split("\\")[1:]])` gives me `b'\xd0\x97'` – matt May 31 '18 at 10:21
  • matt, you have encoded the string to form a set of new octal values.. where as my string object already has the octal value. And @deceze thanks for the edit – Dhamo R May 31 '18 at 10:23
  • `b'\xd0\x97'.decode('utf-8')` → З… – deceze May 31 '18 at 10:24
  • cool.. got it. thanks matt and deceze – Dhamo R May 31 '18 at 10:27
  • @DhamoR isn't that exactly what you want? Unless somebody points you to a decode from octal escaped values. – matt May 31 '18 at 10:27
  • Does this answer your question? [Convert "\x" escaped string into readable string in python](https://stackoverflow.com/questions/63218987/convert-x-escaped-string-into-readable-string-in-python) – Karl Knechtel Aug 05 '22 at 02:44

1 Answers1

4

This is a bit of a hack.

s = '\\320\\227\\320\\264\\320\\260\\320\\275\\320\\270\\320\\265 \\320\\261\\321\\213\\320\\262\\321\\210\\320\\265\\320\\271'

b = bytes([int(i, 8) for i in s.split("\\")[1:]])

print(b.decode("utf8"))

yields: Зданиебывшей

Or use the codecs module.

b2 = codecs.escape_decode(s)[0]
print(b2.decode("utf8"))

Which would yield the same result.

matt
  • 10,892
  • 3
  • 22
  • 34