0

I am trying to insert millions of records into SQL server using PYODBC. I was able to successfully load a couple of 100_000 records without any issues even though it takes around 20 mins. I want to achieve faster ingestion. Please suggest me any such approaches.

Not able to utilize BULK INSERT as the server is remote client. Also I sometimes end up in error 'TCP Provider: An existing connection was forcibly closed by the remote host.' 'Communication Link Failure'. Can I establish connection after inserting every chunk. If I do it, will it cause more performance issue?

Any answers would be appreciated.

Thank you!

Steps that I followed as of now:

  1. cursor.fast_executemany = True
  2. divided the data into chunks of size 10_000 and loaded.
   for i, chunk in enumerate(chunks, start=1):
      start_time = time.time()
      cursor.executemany(insert_query, chunk)
      connection.commit()
      end_time = time.time()
      time_taken = end_time - start_time
      print(f'{i\*10_000} records inserted in {time_taken}')`

Using ODBC Driver 17 of SQL Server and pyodbc: 4.0.22.

Also please let me know if sqlalchemy is better than using pyodbc for faster ingestion?

  • have you tried the methods in this [link](https://towardsdatascience.com/how-i-made-inserts-into-sql-server-100x-faster-with-pyodbc-5a0b5afdba5)? This "converts float column to string with exactly the same decimal point number as defined in the SQL Server table" – alph Apr 19 '23 at 03:58
  • `SqlBulkCopy` in C# uses an internal `INSERT BULK` command which is very fast, but I odn't think it's available in Python – Charlieface Apr 19 '23 at 12:14
  • My data mostly includes NVARCHAR. No floats except 2 int. Another thing to note is that I have a JSON doc in one of the columns for which maximum length goes upto 7000 bytes. – user21677797 Apr 19 '23 at 13:21

0 Answers0