Encoding problems from MySQLdb query result on python

Question

I'm using the library MySQLdb for Python to access a database with entries in Portuguese, with a bunch of accents, which I then save to an Excel file using xlsxwriter. When I'm closing the workbook to save it, I get the following error:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xed in position 59: invalid continuation byte

The result it's complaining about is:

u'QNO XX Conjunto YY, No. Casa ZZ, CEP: AAAAAAAA, Bras\xedlia /DF'

In specific, it should be Brasília instead of Bras\xedlia. How can I get the outputs to be encoded in a friendlier way? Do I have to replace \xed and the like with each possible accent individually?

--EDIT:

I know 0xED is í in latin-1 (iso-8859-1), and given the language (and information from the people in charge of the db) I think that's the right encoding. How do I turn a string that goes 'Bras\xedlia' into one that goes 'Brasília' in general, knowing that?

--EDIT:

If I try to use str(that thing) what I get is

'ascii' codec can't encode character u'\xed' in position 52: ordinal not in range(128)

You can instruct MySQL to translate the results to utf-8 by a `SET NAMES 'UTF8'` query. [More info](http://dev.mysql.com/doc/refman/5.7/en/charset-connection.html). — Kenney, Feb 11 '16 at 17:16
Your text is not UTF8 encoded, it looks more like an iso-8859-x variant. You should identify the encoding and pass it as the `charset` argument to `connect()`, or do an explicit `decode()` on the string. — Klaus D., Feb 11 '16 at 17:19
I have tried several explicit decodes on the string and either the `\xed` remains or it's replaced by some other kind of `\x`-something, never by the actual accent. Passing a charset argument gives me `Can't initialize character set`. — Pedro Carvalho, Feb 11 '16 at 17:32

score 0 · Answer 1 · answered Feb 11 '16 at 18:29

You need to change your charset of your fields and your table.

To do so run one of the following:

mysql> ALTER TABLE <table> CONVERT <col> VARCHAR(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci; (2, 3)

OR

mysql> ALTER TABLE <table> MODIFY <col> VARCHAR(50) CHARACTER SET utf8; (2, 3)

I would prefer the first one.

Lastly, as Klaus D. said you need to connect to mysql with charset="utf8", check the link

score 0 · Accepted Answer · answered Feb 13 '16 at 19:31

It sounds like a problem with xlswriter, not python or MySQL.

0xED says the bytes coming in are latin1, not utf8, not ascii. If you are stuck with 0xED, then do SET NAMES latin1 so that python will communicate correctly with MySQL. It does not matter whether the tables/columns are CHARACTER SET latin1 or utf8, SET NAMES will cause the suitable conversion (if any) to happen during INSERT/SELECT/etc.

Encoding problems from MySQLdb query result on python

2 Answers2