It turns out, that the problem is rather awkward. In short, most variaties and species in MySQL string datatypes map to a single datatype in MySQL's interface with an additional BINARY flag.
Thus, MySQL's VARCHAR
, VARBINARY
, and a string literal map to the same MySQLdb.constants.FIELD_TYPE.VAR_STRING
type in column type definitions, but having an additional MySQLdb.constants.FLAG.BINARY
flag when the type is VARBINARY
or a string collated with a *_bin
collation.
Even though there is a MySQLdb.constants.FIELD_TYPE.VARCHAR
type, I failed to find out when it is used. As I said, MySQL VARCHAR
columns maps to FIELD_TYPE.VAR_STRING
.
The solution becomes rather fragile, if your application uses true binary strings (for example, you store images and fetch them with the same connection as text), since it assumes decoding all binary strings to unicode. Though, it works.
As official docs states:
Because MySQL returns all data as strings and expects you to convert it yourself. This would be a real pain in the ass, but in fact, _mysql can do this for you. (And MySQLdb does do this for you.) To have automatic type conversion done, you need to create a type converter dictionary, and pass this to connect() as the conv keyword parameter.
In practice, real pain in the ass might be the process of constructing your own converters dictionary. But you can import the default one from MySQLdb.converters.conversions
and patch it, or even patch it on an instance of the Connection. The trick is to remove a special converter for a FLAG.BINARY
flag and add a decoder for all cases. If you explicitly specify a charset
parameter for MySQLdb.connect
, it forces use_unicode=1
parameter, which adds the decoder for you, but you can do it yourself:
>>> con = MySQLdb.connect(**params)
>>> con.converter[FIELD_TYPE.VAR_STRING]
[(128, <type 'str'>), (None, <function string_decoder at 0x01FFA130>)]
>>> con.converter[FIELD_TYPE.VAR_STRING] = [(None, con.string_decoder)]
>>> c = con.cursor()
>>> c.execute("SELECT %s COLLATE utf8_bin", u'м')
1L
>>> c.fetchone()
(u'\u043c',)
You might probably need to make the same hack for FIELD_TYPE.STRING
if required.
Another solution is to pass explicit use_unicode=0
to MySQLdb.connect
and make all decodings in your code, but I would not.
Hope, this might be useful to someone.