-1

I've a problem with mysql 5.5 on os x. I'm working on a multilanguages project and using MyISAM table. the default characterset is utf-8 and default collation utf8_unicode_ci.

Italian and German are fine, but spanish is not. I'm using python for manipulating datas, pymysql driver with charset option to utf-8 and unicode true.

Practically all the specific spanish letters are a mess.

from python shell:

>>>r
>>>['Blas P\xc3\xa9rez Gonz\xc3\xa1lez, 4']
>>>print[0]
>>>Blas Pérez González, 4
after saving it to database and fetching it again:
>>>r
>>>(u'Blas P\xc3\xa9rez Gonz\xc3\xa1lez, 4')
>>>print r[0]
>>>Blas Pérez González, 4

I'm really confused, it clearly seems to be the same unicode string!

Thanks.

xaverras
  • 2,577
  • 2
  • 17
  • 15

1 Answers1

0

Better use java-style unicode escapes, like

u'\\u0e4f\\u032f\\u0361\\u0e4f'.decode('unicode-escape')

See similar question.

This ensures that you have unicode in the string.

Then the actual problem: try in mysql describe the_table. Still in the column definition one can set the character set. Try that to see if your table is okay.


For testing: Store u'Blas P\\u00e9rez Gonz\\u00e1lez'.decode('unicode-escape') in the database. Then you know that the correct unicode string is stored. If the database has correct db/table/field definitions, only the retrieval, not storing, may be at fault.

Community
  • 1
  • 1
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • please some more explanation. I can't encode the string in any other format until I'm able to decode the current format first. – xaverras Sep 22 '12 at 20:57
  • Well, many thanks, that seems to be the problem at the least saving the string in java-style unicode and retrive it again will be displayed in the correct way: u'Blas P\xe9rez Gonz\xe1lez', without using java-style was u'Blas P\xc3\xa9rez Gonz\xc3\xa1lez, 4'. My question is now how to retrive all the wrong records from the database, convert it in java-style and save it again. I'm googling since a while, unfortunately without success – xaverras Sep 23 '12 at 09:33
  • what make me confused: print 'Blas P\xc3\xa9rez Gonz\xc3\xa1lez, 4' it will displayed correctly, but print u'Blas P\xc3\xa9rez Gonz\xc3\xa1lez, 4' not and if type unicode(u'Blas Pérez gonzález').encode('utf-8') return 'Blas P\xc3\xa9rez Gonz\xc3\xa1lez' I suppose that it isnt encoded correctly, why? – xaverras Sep 23 '12 at 09:38
  • The edited/displayed character (`é`) depends on the encoding of the editor/viewer/platform. Unfortunately I have only a UTF-8 system at the moment, and no encoding experience in Python. See [u.encode](http://www.evanjones.ca/python-utf8.html). – Joop Eggen Sep 23 '12 at 14:12