utf_16_le_decode SystemError when inserting pandas DataFrame with fast_executemany

Question

This is my code:

def insertDataFrameInDB(cursor, dataFrame, toTable, fieldNames = None):
    if fieldNames:
        dataFrame = dataFrame[fieldNames]
    else:
        fieldNames = dataFrame.columns

    for r in dataFrame.columns.values:
        dataFrame[r] = dataFrame[r].map(str)
        dataFrame[r] = dataFrame[r].map(str.strip)   
    params = [tuple(x) for x in dataFrame.values]

    fieldNameStr = ",".join(fieldNames)
    valueStr = ",".join(["?"] * len(fieldNames))
    sql = "INSERT INTO {} ({}) VALUES({})".format(toTable, fieldNameStr, valueStr)
    cursor.fast_executemany = True
    cursor.executemany(sql, params)
    cursor.commit()


insertDataFrameInDB(cursor, df, "table")

It gives the following error which I really can't address:

DataError                                 Traceback (most recent call last)
DataError: ('String data, right truncation: length 24 buffer 20', '22001')

The above exception was the direct cause of the following exception:

SystemError                               Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\encodings\utf_16_le.py in decode(input, errors)
     15 def decode(input, errors='strict'):
---> 16     return codecs.utf_16_le_decode(input, errors, True)
     17 

SystemError: <built-in function utf_16_le_decode> returned a result with an error set

The above exception was the direct cause of the following exception:

SystemError                               Traceback (most recent call last)
SystemError: decoding with 'utf-16le' codec failed (SystemError: <built-in function utf_16_le_decode> returned a result with an error set)

The above exception was the direct cause of the following exception:

SystemError                               Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\encodings\utf_16_le.py in decode(input, errors)
     15 def decode(input, errors='strict'):
---> 16     return codecs.utf_16_le_decode(input, errors, True)
     17 

SystemError: <built-in function utf_16_le_decode> returned a result with an error set

The above exception was the direct cause of the following exception:

SystemError                               Traceback (most recent call last)
SystemError: decoding with 'utf-16le' codec failed (SystemError: <built-in function utf_16_le_decode> returned a result with an error set)

The above exception was the direct cause of the following exception:

SystemError                               Traceback (most recent call last)
<ipython-input-6-f73d9346f943> in <module>()
     12 
     13 cursor = getCursor(conData)
---> 14 insertDataFrameInDB(cursor, df, "snowplow.sankey")

<ipython-input-1-69ecbca20fc8> in insertDataFrameInDB(cursor, dataFrame, toTable, fieldNames)
     29     sql = "INSERT INTO {} ({}) VALUES({})".format(toTable, fieldNameStr, valueStr)
     30     cursor.fast_executemany = True
---> 31     cursor.executemany(sql, params)
     32     cursor.commit()
SystemError: <class 'pyodbc.Error'> returned a result with an error set

A lot of error searching makes me think it has something to do with the lack of a BOM, I tried to decode the strings in the "params" tuples, also tried str.astype('U'). Does anybody know what causes the problem and possibly how to address that?

If you are using pyodbc 4.0.22 then try downgrading to 4.0.21 (`pip install pyodbc==4.0.21`) and see if that helps. — Gord Thompson, Feb 22 '18 at 13:06
Do the errors persist if you use `cursor.fast_executemany = False` ...? Also, what ODBC driver are you using? — Gord Thompson, Feb 22 '18 at 13:44
It is solved by setting cursor.fast_executemany = False, thanks! But I have a lot of data, and I read that with True it executes about 20 times faster. Isn't there a way to still use the fast_executemany? — Chielio, Feb 22 '18 at 13:53
`fast_executemany` may not be compatible with all ODBC drivers. Which driver are you using? — Gord Thompson, Feb 22 '18 at 14:33
My problem had nothing do with `executemany` but one of my columns contained python lists which resulted in the error above. So converting the column to string with: `df[col].astype(str)` solved the problem for me. — Erfan, Jul 21 '21 at 10:09

score 0 · Accepted Answer · answered Feb 23 '18 at 00:28

You are using Microsoft's "ODBC Driver ... for SQL Server" so fast_executemany should work with pyodbc 4.0.21. However, you can invoke that feature while still using DataFrame#to_sql by using SQLAlchemy execution events as illustrated by this question.

Example: The following code does not take advantage of fast_executemany

import pandas as pd
from sqlalchemy import create_engine
import time

engine = create_engine('mssql+pyodbc://@SQL_panorama')

# test environment
num_rows = 1000
df = pd.DataFrame(
    [[x, f'row{x:03}'] for x in range(num_rows)],
    columns=['id', 'txt']
)
#
cnxn = engine.connect()
try:
    cnxn.execute("DROP TABLE df_to_sql_test")
except:
    pass
cnxn.execute("CREATE TABLE df_to_sql_test (id INT PRIMARY KEY, txt NVARCHAR(50))")

# timing test
t0 = time.time()
df.to_sql("df_to_sql_test", engine, if_exists='append', index=False)
print(f"{num_rows} rows written in {(time.time() - t0):.1f} seconds")

Result:

1000 rows written in 25.2 seconds

Adding a SQLAlchemy execution event handler reduces the execution time significantly

import pandas as pd
from sqlalchemy import create_engine, event
import time

engine = create_engine('mssql+pyodbc://@SQL_panorama')

@event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
    if executemany:
        cursor.fast_executemany = True


# test environment
num_rows = 1000
df = pd.DataFrame(
    [[x, f'row{x:03}'] for x in range(num_rows)],
    columns=['id', 'txt']
)
#
cnxn = engine.connect()
try:
    cnxn.execute("DROP TABLE df_to_sql_test")
except:
    pass
cnxn.execute("CREATE TABLE df_to_sql_test (id INT PRIMARY KEY, txt NVARCHAR(50))")

# timing test
t0 = time.time()
df.to_sql("df_to_sql_test", engine, if_exists='append', index=False)
print(f"{num_rows} rows written in {(time.time() - t0):.1f} seconds")

Result:

1000 rows written in 1.6 seconds

For a more complete discussion of this approach, see this answer.

utf_16_le_decode SystemError when inserting pandas DataFrame with fast_executemany

1 Answers1