My python 2.7 program does a lot of reading from SQL server. One of the columns is defined as varchar(40) and usually hold a string of length around 20. When I profile my code, I found a large amount of the time is spent decoding the string:
ncalls tottime percall cumtime percall filename:lineno(function)
919870 1.133 0.000 1.133 0.000 {_codecs.utf_8_decode}
919870 0.463 0.000 1.596 0.000 utf_8.py:15(decode)
Example code is (reading millions of rows):
cursor = db.cursor()
cursor.execute( "select qaid, value from DATA" )
rows = cursor.fetchall()
for row in rows:
qaid, value = row
values[ qaid ] = value
This seems to come from the _mysql, which automatically decode the data coming from SQL server if the data type is varchar.
elif dbtype in (SQLVARCHAR, SQLCHAR, SQLTEXT):
if strlen(self._charset):
return (<char *>data)[:length].decode(self._charset)
else:
return (<char *>data)[:length]
The database is configured with collation Latin1_General_BIN. I am using python 2.7. The strings I am interested in are always ASCII.
Is there some way to make it not do the decoding? Passing in an empty charset to the connection attempt does not work for me.