0

I have data on an SQL database (MariaDB), some of which contain UTF-8 characters (ÄÖÅ mostly). When printing this data in Python, I don't get the correct characters. However, if I print UTF-8 characters directly (for exampleprint("ÖÖ ää öö")), it works.

In my .py i have # -*- coding: utf-8 -*- and in my .sql I have SET character_set_server = "utf8";

Jerry
  • 1
  • 2

1 Answers1

0

http://mysql.rjweb.org/doc.php/charcoll#python says

1st or 2nd line in source code: # -- coding: utf-8 --

Python code for dumping hex (etc) for string 'u':

for i, c in enumerate(u): print i, '%04x' % ord(c), unicodedata.category(c), print unicodedata.name(c)

Miscellany notes on coding for utf8:

⚈  db = MySQLdb.connect(host=DB_HOST, user=DB_USER, passwd=DB_PASS, db=DB_NAME, charset="utf8", use_unicode=True)
⚈  conn = MySQLdb.connect(host="localhost", user='root', password='', db='', charset='utf8')
⚈  cursor.execute("SET NAMES utf8mb4;") -- not as good as using `charset'
⚈  db.set_character_set('utf8'), implies use_unicode=True
⚈  Literals should be u'...'
⚈  MySQL-python 1.2.4 fixes a bug wherein varchar(255) CHARACTER SET utf8 COLLATE utf8_bin is treated like a BLOB.

Checklist:

⚈  `# -*- coding: utf-8 -*-` -- (you have that)
⚈  `charset='utf8'` in `connect()` call -- Is that buried in `bottle_mysql.Plugin`? (Note: Try 'utf-8' and 'utf8')
⚈  Text encoded in utf8.
⚈  No need for encode() or decode() if you are willing to accept utf8 everywhere.
⚈  `u'...'` for literals
⚈  `` near start of html page
⚈  Content-Type: text/html; charset=UTF-8 (in HTTP response header)
⚈  header('Content-Type: text/html; charset=UTF-8'); (in PHP to get that response header)
⚈  `CHARACTER SET utf8 COLLATE utf8_general_ci` on column (or table) definition in MySQL.
⚈  utf8 all the way through

References:

⚈  https://docs.python.org/2/howto/unicode.html#the-unicode-type
⚈  http://stackoverflow.com/questions/9154998/python-encoding-mysql
⚈  http://dev.mysql.com/doc/connector-python/en/connector-python-connectargs.html

The Python language environment officially only uses UCS-2 internally since version 2.0, but the UTF-8 decoder to "Unicode" produces correct UTF-16. Since Python 2.2, "wide" builds of Unicode are supported which use UTF-32 instead;[16] these are primarily used on Linux. Python 3.3 no longer ever uses UTF-16, instead strings are stored in one of ASCII/Latin-1, UCS-2, or UTF-32, depending on which code points are in the string, with a UTF-8 version also included so that repeated conversions to UTF-8 are fast.

Rick James
  • 135,179
  • 13
  • 127
  • 222