4

I connect to MySQL and retrieve usernames containing 'Ö', 'ğ', 'Ş', etc. It works fine with MySQL or PHP but in Python 2.6.8, an error occur. Here is my code:

#C:\Python27\Lib\encodings
#-*- coding: utf-8 -*-

conn = MySQLdb.Connect(host="localhost", user="root", passwd="mypass", db="mydb", charset="utf8", init_command="SET NAMES UTF8")
cursor = conn.cursor(MySQLdb.cursors.DictCursor)
cursor.execute("select * from users");
tmpDict=cursor.fetchallDict()
print tmpDict[0]['NAME'].decode('utf8')

I expect 'Ömer Şirin' here but instead I get the following error:

'ascii' codec can't encode character u'\xd6' in position 0: ordinal not in range(128)

How can I fix this?

dsh
  • 12,037
  • 3
  • 33
  • 51
  • 2
    Why do you want to decode('utf8') – Ritobroto Mukherjee Nov 13 '15 at 10:18
  • 1
    Can you `print repr(tmpDict[0]['NAME'])`? – SuperBiasedMan Nov 13 '15 at 10:18
  • @SuperBiasedMan result=None –  Nov 13 '15 at 10:24
  • @RitobrotoMukherjee I want decode nothing just get my strings correct –  Nov 13 '15 at 10:25
  • Curious, is that what you got when you replaced `print tmpDict[0]['NAME'].decode('utf8')`? Or did you put that line of code somewhere else? – SuperBiasedMan Nov 13 '15 at 10:26
  • Does setting locale as demonstrated here: http://stackoverflow.com/questions/27347772/print-unicode-string-in-python-regardless-of-environment help you? – Roman Susi Nov 13 '15 at 10:28
  • Also, I think it should be .encode('utf-8') from Unicode, not the other way. If result is in utf-8, you do not need to encode or decode anything. – Roman Susi Nov 13 '15 at 10:30
  • @SuperBiasedMan 'UnicodeEncodeError'... –  Nov 13 '15 at 10:32
  • @RomanSusi Smilar stiation.. but cant solve with most voted answer. I think the problem is python file encoding is 'ascii' my data is already utf8 yeah but some utf8 characters not in range of ascii –  Nov 13 '15 at 10:36
  • @MehmetYenerYILMAZ: what happens if you add three lines: `print type(tmpDict[0]['NAME'])` `print repr(tmpDict[0]['NAME'])` `print u'\xd6'` to the code and copy-paste the results **literally** (it is possible that `repr()` is `result=None` but it is *very* unusual). – jfs Nov 13 '15 at 20:11
  • The problem might be related to the output operation, not the retrieval from the DB, and there are hundreds of postings here discussing this. That said, why are you using an ancient Python 2 versions instead of a recent Python 3? – Ulrich Eckhardt Nov 13 '15 at 20:21

2 Answers2

2

There are two errors and two issues:

  1. UnicodeEncodeError:'charmap' with 'Ö','Ç' etc
  2. 'ascii' codec can't encode character u'\xd6' in position 0: ordinal not in range(128)

If type(tmpDict[0]['NAME']) == unicode then the second issue can be reproduced easily:

>>> u'\xd6'.decode('utf-8') #XXX BROKEN, DO NOT DO IT!!!
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  ...
UnicodeEncodeError: 'ascii' codec can't encode character u'\xd6' in position 0: ordinal not in range(128)

What happens is that u'\xd6' is already a Unicode string and therefore before decoding it, it has to be converted to bytes first and Python uses the default encoding ('ascii') to do it. The correct solution is to drop .decode('utf-8') -- do not decode Unicode strings (it is fixed in Python 3, you get AttributeError there if you try to decode a Unicode string).

The first issue "UnicodeEncodeError:'charmap'" is likely due to printing Unicode to Windows console. To reproduce, run print u'\xd6'. To fix it, install win-unicode-console.

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
0

The MySQL driver automatically decodes UTF-8 strings into Python Unicode objects.

You should be able to prove this with:

>>> type(tmpDict[0]['NAME'])
<type 'unicode'>

You should be able to print tmpDict[0]['NAME'] straight to the console. If you still have problems printing, lookup the exception in StackOverflow again

Alastair McCormack
  • 26,573
  • 8
  • 77
  • 100