In response to my question How to speed up data wrangling A LOT in Python + Pandas + sqlAlchemy + MSSQL/T-SQL I was kindly directed to Speeding up pandas.DataFrame.to_sql with fast_executemany of pyODBC by @IljaEverilä.
NB For test purposes I am only reading/writing 10k rows.
I added the event listener and a) the function is called but b) clearly executemany is not set as the IF fails and cursor.fast_executemay is not set.
def namedDbSqlAEngineCreate(dbName):
# Create an engine and switch to the named db
# returns the engine if successful and None if not
# 2018-08-23 added fast_executemany accoding to this https://stackoverflow.com/questions/48006551/speeding-up-pandas-dataframe-to-sql-with-fast-executemany-of-pyodbc?rq=1
engineStr = 'mssql+pyodbc://@' + defaultDSN
engine = sqla.create_engine(engineStr, echo=False)
@event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
# print("FUNC call")
if executemany:
print('executemany')
cursor.fast_executemany = True
try:
engine.execute('USE ' +dbName)
return(engine)
except sqla.exc.SQLAlchemyError as ex:
if ex.orig.args[0] == '08004':
print('namedDbSqlAEngineCreate:Database %s does not exist' % dbName)
else:
print(ex.args[0])
return(None)
Naturally there is no change in speed.
The code in my original question is unchanged in the to_sql
nasToFillDF.to_sql(name=tempTableName, con=engine.engine, if_exists='replace', chunksize=100, index=False)
because I tried, per the example, setting chunksize = None and receive the error message (which I had encountered previously)
(pyodbc.ProgrammingError) ('The SQL contains -31072 parameter markers, but 100000 parameters were supplied', 'HY000')
What have I done wrong? I guess the executemany parameter of the receive_before_cursor_execute is not set, but if that is the answer I have no idea how to fix it.
Setup is pyodbc 4.0.23, sqlAchemy 1.2.6, Python 3.6.something