3

i've got a row of bytes: '\udcd0\udca0\udcd0\udcbe\udcd1\udc81\udcd0\udcbd\udcd0\udcb5\udcd1\udc84\udcd1\udc82\udcd1\udc8c'

If i do:

b'\udcd0\udca0\udcd0\udcbe\udcd1'.decode("utf8"),

I recieve:

'\\udcd0\\udca0\\udcd0\\udcbe\\udcd1'

I cant decode it, because i dont know, how it was encoded. At least, we can see, that its not utf-8, because, symbols i expect to see, have a \x23-similar representation. How can i discover the decoder and decode it?

P.S. i expect to see russian symbols there

  • http://stackoverflow.com/questions/436220/python-is-there-a-way-to-determine-the-encoding-of-text-file – jkr Nov 05 '16 at 21:28
  • @Jakub Thank you very much, but for some reason, i cant install any of suggested libraries. Are there any other ways? –  Nov 05 '16 at 21:39

1 Answers1

0

I am able to print your string in this way, but the output is all "invalid characters."

>>> string = u'\udcd0\udca0\udcd0\udcbe\udcd1\udc81\udcd0\udcbd\udcd0\udcb5\udcd1\udc84\udcd1\udc82\udcd1\udc8c'
>>> print string
����������������

According to Charbase.com, your first character (u'\udcd0') is invalid character. So maybe the output is correct.

jkr
  • 17,119
  • 2
  • 42
  • 68