4

I am currently using python 2.7 and doing web scraping on a Chinese website.

How to convert unicode below into a string?

Simple str() function does not work and states UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-11: ordinal not in range(128)

Thanks in advance,

    u'\n\xe4\xb8\xad\xe5\x9b\xbd\xe6\xb7\xb1\xe5\x9c\xb3\n'
wim
  • 338,267
  • 99
  • 616
  • 750
Perry Zhuang
  • 65
  • 1
  • 4
  • Possible duplicate of [UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)](http://stackoverflow.com/questions/9942594/unicodeencodeerror-ascii-codec-cant-encode-character-u-xa0-in-position-20) – ImportanceOfBeingErnest Nov 14 '16 at 21:46

1 Answers1

2

Your string was already encoded, so it should be a bytes object not a unicode object. Try and solve that problem instead. i.e. the repr of your scraped data should be looking like this:

'\n\xe4\xb8\xad\xe5\x9b\xbd\xe6\xb7\xb1\xe5\x9c\xb3\n'

not like this:

u'\n\xe4\xb8\xad\xe5\x9b\xbd\xe6\xb7\xb1\xe5\x9c\xb3\n'

To recover the Chinese text from the unicode object, you can jump to bytes and back:

>>> text = u'\n\xe4\xb8\xad\xe5\x9b\xbd\xe6\xb7\xb1\xe5\x9c\xb3\n'
>>> print text.encode('latin-1').decode('utf-8')

中国深圳
wim
  • 338,267
  • 99
  • 616
  • 750