Performance of reading varchar from SQL server using pymssql

Question

My python 2.7 program does a lot of reading from SQL server. One of the columns is defined as varchar(40) and usually hold a string of length around 20. When I profile my code, I found a large amount of the time is spent decoding the string:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
919870    1.133    0.000    1.133    0.000 {_codecs.utf_8_decode}
919870    0.463    0.000    1.596    0.000 utf_8.py:15(decode)

Example code is (reading millions of rows):

cursor = db.cursor()
cursor.execute( "select qaid, value from DATA" )
rows = cursor.fetchall()
for row in rows:
    qaid, value = row
    values[ qaid ] = value

This seems to come from the _mysql, which automatically decode the data coming from SQL server if the data type is varchar.

elif dbtype in (SQLVARCHAR, SQLCHAR, SQLTEXT):
    if strlen(self._charset):
        return (<char *>data)[:length].decode(self._charset)
    else:
        return (<char *>data)[:length]

The database is configured with collation Latin1_General_BIN. I am using python 2.7. The strings I am interested in are always ASCII.

Is there some way to make it not do the decoding? Passing in an empty charset to the connection attempt does not work for me.

As answered by Dror Asaf, adding charset='LATIN1' solved my problem. I tried fetchmany(). It did not provide noticeable improvement in performance (if anything it is slightly slower). — Investment Biker, Apr 21 '15 at 17:35

score 0 · Accepted Answer · edited May 23 '17 at 11:57

If the database is using a specific character set encoding, it can be passed to the connect function as a parameter named charset.

The default encoding for any connection is "UTF-8"

pymssql.connect(server='.', user='', password='', database='', timeout=0,     
login_timeout=60, charset='UTF-8', as_dict=False, host='', appname=None,    
port='1433', conn_properties)

http://pymssql.org/en/latest/ref/pymssql.html

Please note that regarding the performance, it might be related to the usage of the cursor.

This has been discussed earlier here

Usage of the fetchall function might lead to bad performance if the table in the database has many entries, please consider testing it with multiple calls to fetchmany.

Performance of reading varchar from SQL server using pymssql

1 Answers1