How do I convert 'blah \xe9 blah' to 'blah é blah'

Question

I'm working on some python code that gets a bunch of strings from an external API. The strings are regular python strings with weird 'slash' characters. I think this means the actual data is UTF-8, but python thinks it's ascii (or rather some other incorrect encoding). The solution (I assume) is to run some sort of operation on the string to "fix it". But every conversion way that I know of returns an error:

>>> s = 'blah \xe9 blah'
>>> unicode(s)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 5: ordinal not in range(128)
>>> s.decode('utf-8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 5: invalid continuation byte
>>> s.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 5: ordinal not in range(128)

`print 'blah \xe9 blah'.decode("latin-1")` – Padraic Cunningham Nov 29 '14 at 00:20 — Padraic Cunningham, Nov 29 '14 at 00:20

score 1 · Accepted Answer · answered Nov 29 '14 at 00:20

1

That's not UTF-8.

>>> print 'blah \xe9 blah'.decode('latin-1')
blah é blah

answered Nov 29 '14 at 00:20

Ignacio Vazquez-Abrams

776,304
153
1,341
1,358

What about this string: 'Al\xf3nnisos Town'? I still get an error – priestc Nov 29 '14 at 00:32

How do I convert 'blah \xe9 blah' to 'blah é blah'

1 Answers1

Linked