0

I am looking for suggestion on best practices to insert a large amount of records I have in a Pandas dataframe into a SQL Server database. What I have works but I notice that whenever I run my Python script the processing usage on the server goes up to 99%. Looking at system resources, Python is consuming most of those resources. Pythong and SQL Server are running on the same box (Windows 2016).

I was hoping someone could give me some advice on how I can implement this better.

This is my code:

from sqlalchemy import create_engine
import pyodbc
import pandas as pd

iterator = pd.read_csv('large_file.csv', encoding='utf-8', iterator=True, chunksize = 1000)

df = pd.concat([chunk[chunk.active == True] for chunk in iterator])

# At this point the Dataframe DF has several million rows.

engine = create_engine(‘mssql+pyodbc://server_name/database_name?driver=SQL Server?Trusted_Connection=yes’)

df.to_sql('myTable', con=engine, if_exists='append',index=False,chunksize=1000)

I tested and the problem is not when reading the large file, it takes sometime but resource usage is low. The processing power goes up to the roof when the SQL insertion process begins.

Thank you!

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Wilmar
  • 558
  • 1
  • 5
  • 16
  • I would really suggest using pyodbc directly to upload and writing your own function to bulk upload. pandas can be very slow in uploading huge data specially if the datatypes are not defined and can use up high resources. speaking from a recent project experience. – Yash Sep 16 '20 at 16:55
  • OK, point taken. Since I do a lot more of process on the data where I do need Pandas, then what would the best approach be? Write to a csv with Pandas and then upload directly with pyodbc? Thanks! – Wilmar Sep 16 '20 at 18:22
  • 1
    Precisely. this [link](https://stackoverflow.com/questions/29638136/how-to-speed-up-bulk-insert-to-ms-sql-server-from-csv-using-pyodbc) might help you. – Yash Sep 16 '20 at 18:30
  • Excellent. Thank you for the pointer. – Wilmar Sep 16 '20 at 20:28

0 Answers0