I am trying to upload data to a MS Azure Sql database using pandas to_sql and it takes very long. I often have to run it before I go to bed and wake up in the morning and it is done but has taken several hours and if there is an error that comes up I am not able to address it. Here is the code I have:
params = urllib.parse.quote_plus(
'Driver=%s;' % driver +
'Server=%s,1433;' % server +
'Database=%s;' % database +
'Uid=%s;' % username +
'Pwd={%s};' % password +
'Encrypt=yes;' +
'TrustServerCertificate=no;'
)
conn_str = 'mssql+pyodbc:///?odbc_connect=' + params
engine = create_engine(conn_str)
@event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
if executemany:
cursor.fast_executemany = True
cursor.commit()
connection = engine.connect()
connection
Then I run this command for the sql ingestion:
master_data.to_sql('table_name', engine, chunksize=500, if_exists='append', method='multi',index=False)
I have played around with the chunksize and the sweet spot seems to be 100, which isn't fast enough considering I am usually trying to upload 800,000-2,000,000 records at a time. If I increase it beyond that I will get an error which seems to only be related to the chunk size.
OperationalError: (pyodbc.OperationalError) ('08S01', '[08S01] [Microsoft][ODBC Driver 17 for SQL Server]Communication link failure (0) (SQLExecDirectW)')