More Efficient Way To Insert Dataframe into SQL Server

Question

I am trying to update a SQL table with updated information which is in a dataframe in pandas.

I have about 100,000 rows to iterate through and it's taking a long time. Any way I can make this code more efficient. Do I even need to truncate the data? Most rows will probably be the same.

 conn = pyodbc.connect ("Driver={xxx};"
            "Server=xxx;"
            "Database=xxx;"
            "Trusted_Connection=yes;")
cursor = conn.cursor()
cursor.execute('TRUNCATE dbo.Sheet1$') 

for index, row in df_union.iterrows():
    print(row)
    cursor.execute("INSERT INTO dbo.Sheet1$ (Vendor, Plant) values(?,?)", row.Vendor, row.Plant)

Update: This is what I ended up doing.

 params = urllib.parse.quote_plus(r'DRIVER={xxx};SERVER=xxx;DATABASE=xxx;Trusted_Connection=yes')

conn_str = 'mssql+pyodbc:///?odbc_connect={}'.format(params)
engine = create_engine(conn_str)
df = pd.read_excel('xxx.xlsx')
print("loaded")
df.to_sql(name='tablename',schema= 'dbo', con=engine, if_exists='replace',index=False, chunksize = 1000, method = 'multi')

[pd.to_sql](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html)? Set the chunk size as well. — Quang Hoang, Dec 02 '21 at 16:05
Does this answer your question? [How to speed up bulk insert to MS SQL Server using pyodbc](https://stackoverflow.com/questions/29638136/how-to-speed-up-bulk-insert-to-ms-sql-server-using-pyodbc) — Charlieface, Dec 02 '21 at 20:18
Kind of. I ended up figuring it out and updated the question. I appreciate your help and time. — dan, Dec 04 '21 at 22:39

Enrique Flores · Answer 1 · 2021-12-02T17:36:42.487

0

Don't use for or cursors just SQL

insert into TABLENAMEA (A,B,C,D) 
select A,B,C,D from TABLENAMEB

Take a look to this link to see another demo: https://www.sqlservertutorial.net/sql-server-basics/sql-server-insert-into-select/

You just need to update this part to run a normal insert

conn = pyodbc.connect ("Driver={xxx};"
            "Server=xxx;"
            "Database=xxx;"
            "Trusted_Connection=yes;")
cursor = conn.cursor()
cursor.execute('insert into TABLENAMEA (A,B,C,D) select A,B,C,D from TABLENAMEB')

You don't need to store the dataset in a variable, just run the query directly as normal SQL, performance will be better than a iteration

edited Dec 02 '21 at 17:36

answered Dec 02 '21 at 16:10

Enrique Flores

716
4
13

Thank you, but it needs to be done in Python. – dan Dec 02 '21 at 16:15
I will share an example, give me some minutes – Enrique Flores Dec 02 '21 at 16:31
Thank you Enrique! – dan Dec 02 '21 at 16:36

More Efficient Way To Insert Dataframe into SQL Server

1 Answers1