http://mysql.rjweb.org/doc.php/charcoll#python
says
1st or 2nd line in source code: # -- coding: utf-8 --
Python code for dumping hex (etc) for string 'u':
for i, c in enumerate(u):
print i, '%04x' % ord(c), unicodedata.category(c),
print unicodedata.name(c)
Miscellany notes on coding for utf8:
⚈ db = MySQLdb.connect(host=DB_HOST, user=DB_USER, passwd=DB_PASS, db=DB_NAME, charset="utf8", use_unicode=True)
⚈ conn = MySQLdb.connect(host="localhost", user='root', password='', db='', charset='utf8')
⚈ cursor.execute("SET NAMES utf8mb4;") -- not as good as using `charset'
⚈ db.set_character_set('utf8'), implies use_unicode=True
⚈ Literals should be u'...'
⚈ MySQL-python 1.2.4 fixes a bug wherein varchar(255) CHARACTER SET utf8 COLLATE utf8_bin is treated like a BLOB.
Checklist:
⚈ `# -*- coding: utf-8 -*-` -- (you have that)
⚈ `charset='utf8'` in `connect()` call -- Is that buried in `bottle_mysql.Plugin`? (Note: Try 'utf-8' and 'utf8')
⚈ Text encoded in utf8.
⚈ No need for encode() or decode() if you are willing to accept utf8 everywhere.
⚈ `u'...'` for literals
⚈ `` near start of html page
⚈ Content-Type: text/html; charset=UTF-8 (in HTTP response header)
⚈ header('Content-Type: text/html; charset=UTF-8'); (in PHP to get that response header)
⚈ `CHARACTER SET utf8 COLLATE utf8_general_ci` on column (or table) definition in MySQL.
⚈ utf8 all the way through
References:
⚈ https://docs.python.org/2/howto/unicode.html#the-unicode-type
⚈ http://stackoverflow.com/questions/9154998/python-encoding-mysql
⚈ http://dev.mysql.com/doc/connector-python/en/connector-python-connectargs.html
The Python language environment officially only uses UCS-2 internally since version 2.0, but the UTF-8 decoder to "Unicode" produces correct UTF-16. Since Python 2.2, "wide" builds of Unicode are supported which use UTF-32 instead;[16] these are primarily used on Linux. Python 3.3 no longer ever uses UTF-16, instead strings are stored in one of ASCII/Latin-1, UCS-2, or UTF-32, depending on which code points are in the string, with a UTF-8 version also included so that repeated conversions to UTF-8 are fast.