1

I'm a Python3 User. And I'm now face some problem about byte to string control..

First, I'm get data from some server as a byte.

[Byte data] : b'\xaaD\x12\x1c+\x00\x00 \x18\x08\x00\x00\x88\xb4\xa2\x07\xf8\xaf\xb6\x19\x00\x00\x00\x00\x03Q\xfa3/\x00\x00\x00\x1d\x00\x00\x00\x86=\xbd\xc9~\x98uA>\xdf#=\x9a\xd8\xdb\x18\x1c_\x9c\xc1\xe4\xb4\xfc;'

This data isn't escape any string type such as utf-8, unicode-escape ...

Who know the solution how to control these data?

spritecodej
  • 459
  • 1
  • 4
  • 13

2 Answers2

3

As @MrE says, you need to use the bytes.decode method. Or you can provide an encoding to the str() initialisation function. You can ignore the errors in the following fashion, but it still produces gibberish:

>>> x = b'\xaaD\x12\x1c+\x00\x00 \x18\x08\x00\x00\x88\xb4\xa2\x07\xf8\xaf\xb6\x19\x00\x00\x00\x00\x03Q\xfa3/\x00\x00\x00\x1d\x00\x00\x00\x86=\xbd\xc9~\x98uA>\xdf#=\x9a\xd8\xdb\x18\x1c_\x9c\xc1\xe4\xb4\xfc;'
>>> x.decode("UTF-8", errors="ignore")
'D\x12\x1c+\x00\x00 \x18\x08\x00\x00\x07\x19\x00\x00\x00\x00\x03Q3/\x00\x00\x00\x1d\x00\x00\x00=~uA>#=\x18\x1c_;'

But as you said, you don't know the encoding so taking some tips from this question, I looped through the available encodings, but didn't see anything intelligible, e.g.

>>> for c in codec_list:
...     try:
...         print(str(x, c, errors="ignore"))
...     except:
...         pass
...
D+         Q3/      =~uA>#=_;
枋+     揣灝    Q3/      =褕~uA>#=嵫_鍵渝;
枋+     揣灝    Q3/      =褕~A>#=_銧;
¡à    h©s8®¶    é³      f¨I=qÍ ÿªQû¬æAU©Ü
ד    h©s8®¶    י³      f¨I=qאQ¬AU©
¬D+     ê┤ó°»╢    Q·3/      å=╜╔~ÿuA>▀#=Ü╪█_£┴Σ┤ⁿ;
...

Then I had a look at the Universal Character Encoding Detection module, chardet, also linked above, but that couldn't find anything:

>>> import chardet
>>> chardet.detect(x)
{'encoding': None, 'confidence': 0.0, 'language': None}

So perhaps you need to look at how you're obtaining your data, the chardet documentation has some simple, useful examples.

import random
  • 3,054
  • 1
  • 17
  • 22
0

You need to decode the byte data:

byte_data.decode("utf-8")

MrE
  • 19,584
  • 12
  • 87
  • 105
  • however it seems your string has wrong start character so you need to figure what it is encoded in to begin with – MrE Jun 23 '17 at 00:06