0

When I output some Chinese character in Python (Pandas), it shows as below

\xe8\xbf\x99\xe7\xa7\x8d\xe6\x83\x85\xe5\x86\xb5\xe6\x98\xaf\xe6\xb2\xb9\xe6\xb3\xb5\xe6\x95\x85\xe9\x9a\x9c\xe7\x81\xaf\xef\xbc\x8c\xe6\xa3\x80\xe6\x9f\xa5\xe4\xb8\x80\xe4\xb8\x8b\xe6\xb2\xb9\xe6\xb3\xb5\xe6\x8f\x92\xe5\xa4\xb4\xe6\x98\xaf\xe5\x90\xa6\xe6\x8e\xa5\xe8\x99\x9a\xef\xbc\x8c\xe7\x84\xb6\xe5\x90\x8e\xe6\x9f\xa5\xe4\xb8\x80\xe4\xb8\x8b\xe6\xb2\xb9\xe6\xb3\xb5\xe5\x86\x85\xe7\xae\xa1\xe9\x81\x93\xe5\x8e\x8b\xe5\x8a\x9b\xe6\x98\xaf\xe5\x90\xa6\xe7\xac\xa6\xe5\x90\x88\xe6\xad\xa3\xe5\xb8\xb8\xe5\x80\xbc\xe3\x80\x82

What is the encoding format? It is not unicode as I know. Thanks!

Daming Lu
  • 316
  • 1
  • 5
  • 21
  • 1
    Try putting `# -*- coding: utf-8 -*-` at the top of your python source file to force Pytohn into UTF-8 – Ben Jul 13 '18 at 22:24
  • 3
    that's hexadecimal – joel Jul 13 '18 at 22:25
  • 2
    @Ben A coding directive only affects how the interpreter decodes the script itself, it has no effect on what the script does to external data that it reads or writes. – PM 2Ring Jul 13 '18 at 22:25
  • 1
    That looks like UTF-8 encoded Chinese to me, although I don't read Chinese. 这种情况是油泵故障灯,检查一下油泵插头是否接虚,然后查一下油泵内管道压力是否符合正常值。 – PM 2Ring Jul 13 '18 at 22:28
  • @PM2Ring I'm assuming he's doing something like `print('你好')` and getting hex output. I don't have a lot of encoding problems, so I could very well be wrong – Ben Jul 13 '18 at 22:28
  • It is hexadecimal. There are tools online to convert hexadecimal to text :D – Daming Lu Jul 13 '18 at 23:27
  • 1
    Surely those online tools want to know what the encoding is as well? – Jongware Jul 14 '18 at 00:01

3 Answers3

1

The output you are receiving is called a bytes object. In order to decode it, you need to do output.decode('utf-8').

For example:

output = b'\xe8\xbf\x99\xe7...'
unicode_output = output.decode('utf-8')
print(unicode_output)

would then output non-latin characters (I cannot include it because it counts as spam).

Another way to do this in one-line would be: print(b'\xe8\xbf\x99\xe7...'.decode('utf-8')).

However, if that doesn't work, then it is probably because of the fact that your output isn't a bytes object, but is contained within a string. If that does not work, then there is another solution.

output = '\xe8\xbf\x99\xe7...'
exec('print(b\''+ output + '\'.decode(\'utf-8\'))')

That should be able to fix it. Hope you got something useful out of this. Have a good day!

MilkyWay90
  • 2,023
  • 1
  • 9
  • 21
0

This is bytes type, containing a valid utf-8 Chinese text (as far as I can trust Google Translate).

If it's a string literal from your code, add # -*- coding: utf-8 -*- as the first line of your Python file.

If it's an external data, here's how to convert it to a text (str type): bytes_text.decode("utf-8")

Victor Sergienko
  • 13,115
  • 3
  • 57
  • 91
0

raw_bytes = b'\xe8\xbf\x99\xe7\xa7\x8d\xe6\x83\x85 . . .'

with raw_bytes a <class 'bytes'> object containing your hexadecimal characters you can then call decode on raw_bytes and get a <class 'str'> representation of your characters.

string_text = raw_bytes.decode("utf-8")

rigsby
  • 774
  • 7
  • 20