2

I'm having problems with reading utf-8 data from a MySQL DB using Python. My DB contains one table named Videos, and the table contains at least one row that has Unicode chars, i.e.

[KR] Samsung Galaxy Beam 2 간단 리뷰 [4K]

The collation of the table is utf8_general_ci, just like the collation of the fields in the table.

This is the code I wrote in order to fetch all the data from my table:

# Open database connection
db = MySQLdb.connect("localhost","matan","pass","youtube", charset = 'utf8',use_unicode=True)

# prepare a cursor object using cursor() method
cursor = db.cursor()

# Prepare SQL query to INSERT a record into the database.
sql = "SELECT * FROM VIDEOS"
try:
   # Execute the SQL command
   cursor.execute(sql)
   # Fetch all the rows in a list of lists.
   results = cursor.fetchall()
   for row in results:
      title = row[0]
      link = row[1]
      # Now print fetched result
      print ("title=%s\nlink=%s\n\n" % \
            (title, link))
except:
   print "Error: unable to fecth data"

# disconnect from server
db.close()

When I run the above code, it prints all the rows that contain only "ascii" chars, but when it gets to a row that contains Unicode char (i.e. the line I mentioned above), it prints :

File "C:\Users\Matan\Dropbox\Code\Python\youtube.py", line 28, in printall (title, link)) File "C:\Python27\lib\encodings\cp862.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode characters in position 33-34: c haracter maps to <undefined>

and doesn't continue to the next row.

I'm using PhpMyAdmin version 4.1.14, MySQL version 5.6.17, and Python version 2.7.8 .

Edit: I dropped the except clause, and updated the error I've got.

matan89
  • 33
  • 1
  • 1
  • 6
  • If you really want to know what goes wrong, first get rid of this more-than-useless bare except clause and let the real exception propagate. Then please come back and post the full traceback. Just for the record: since you pass `use_unicode=True` to your connection, all you get back from the db are unicode strings (type `Unicode`, not type `str`). You have to encode them to the proper encoding before printing them (the proper encoding being system-dependant). – bruno desthuilliers Dec 03 '14 at 16:05
  • I edited my question with your advice. thanks. – matan89 Dec 03 '14 at 17:01

1 Answers1

3

Your problem is with your terminal (sys.stdout) encoding (cf http://en.wikipedia.org/wiki/Code_page_862), which depends on your system's settings. The best solution (as explained here : https://stackoverflow.com/a/15740694/41316) is to explicitely encode your unicode data before printing them to sys.stdout.

If you can't use a more usable encoding (utf-8 comes to mind, as it has been designed to handle all unicode characters), you can at least use an alternative error handling like "replace" (replaces non-encodable characters with '?') or "ignore" (suppress non-encodable characters).

Here's a corrected version of your code, you can play with the encoding and on_error settings to find out what solution works for you:

import sys
import MySQLdb

# set desired output encoding here
# it looks like your default encoding is "cp862"
# but you may want to first try 'utf-8' first
# encoding = "cp862"
encoding = "utf-8" 

# what do when we can't encode to the desired output encoding
# options are:
# - 'strict' : raises a UnicodeEncodeError (default)
# - 'replace': replaces missing characters with '?'
# - 'ignore' : suppress missing characters
on_error = "replace" 

db = MySQLdb.connect(
   "localhost","matan","pass","youtube", 
   charset='utf8',
   use_unicode=True
   )
cursor = db.cursor()
sql = "SELECT * FROM VIDEOS"
try:
   cursor.execute(sql)
   for i, row in enumerate(cursor):
      try:
         # encode unicode data to the desired output encoding
         title = row[0].encode(encoding, on_error)
         link = row[1].encode(encoding, on_error)
      except UnicodeEncodeError as e:
         # only if on_error='strict'
         print >> sys.stderr, "failed to encode row #%s - %s" % (i, e)
      else:
         print "title=%s\nlink=%s\n\n" % (title, link))
finally:
   cursor.close()
   db.close()

NB : you may also want to read this (specially the comments) http://drj11.wordpress.com/2007/05/14/python-how-is-sysstdoutencoding-chosen/ for more on Python, strings, unicode, encoding, sys.stdout and terminal issues.

Community
  • 1
  • 1
bruno desthuilliers
  • 75,974
  • 6
  • 88
  • 118