0

I'm working on some python code that gets a bunch of strings from an external API. The strings are regular python strings with weird 'slash' characters. I think this means the actual data is UTF-8, but python thinks it's ascii (or rather some other incorrect encoding). The solution (I assume) is to run some sort of operation on the string to "fix it". But every conversion way that I know of returns an error:

>>> s = 'blah \xe9 blah'
>>> unicode(s)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 5: ordinal not in range(128)
>>> s.decode('utf-8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 5: invalid continuation byte
>>> s.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 5: ordinal not in range(128)
priestc
  • 33,060
  • 24
  • 83
  • 117

1 Answers1

1

That's not UTF-8.

>>> print 'blah \xe9 blah'.decode('latin-1')
blah é blah
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358