I am trying to insert millions of records into SQL server using PYODBC. I was able to successfully load a couple of 100_000 records without any issues even though it takes around 20 mins. I want to achieve faster ingestion. Please suggest me any such approaches.
Not able to utilize BULK INSERT as the server is remote client. Also I sometimes end up in error 'TCP Provider: An existing connection was forcibly closed by the remote host.' 'Communication Link Failure'. Can I establish connection after inserting every chunk. If I do it, will it cause more performance issue?
Any answers would be appreciated.
Thank you!
Steps that I followed as of now:
- cursor.fast_executemany = True
- divided the data into chunks of size 10_000 and loaded.
for i, chunk in enumerate(chunks, start=1):
start_time = time.time()
cursor.executemany(insert_query, chunk)
connection.commit()
end_time = time.time()
time_taken = end_time - start_time
print(f'{i\*10_000} records inserted in {time_taken}')`
Using ODBC Driver 17 of SQL Server and pyodbc: 4.0.22.
Also please let me know if sqlalchemy is better than using pyodbc for faster ingestion?