1

I am trying to update a SQL table with updated information which is in a dataframe in pandas.

I have about 100,000 rows to iterate through and it's taking a long time. Any way I can make this code more efficient. Do I even need to truncate the data? Most rows will probably be the same.

 conn = pyodbc.connect ("Driver={xxx};"
            "Server=xxx;"
            "Database=xxx;"
            "Trusted_Connection=yes;")
cursor = conn.cursor()
cursor.execute('TRUNCATE dbo.Sheet1$') 

for index, row in df_union.iterrows():
    print(row)
    cursor.execute("INSERT INTO dbo.Sheet1$ (Vendor, Plant) values(?,?)", row.Vendor, row.Plant)

Update: This is what I ended up doing.

 params = urllib.parse.quote_plus(r'DRIVER={xxx};SERVER=xxx;DATABASE=xxx;Trusted_Connection=yes')

conn_str = 'mssql+pyodbc:///?odbc_connect={}'.format(params)
engine = create_engine(conn_str)
df = pd.read_excel('xxx.xlsx')
print("loaded")
df.to_sql(name='tablename',schema= 'dbo', con=engine, if_exists='replace',index=False, chunksize = 1000, method = 'multi')
dan
  • 62
  • 8
  • 1
    [pd.to_sql](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html)? Set the chunk size as well. – Quang Hoang Dec 02 '21 at 16:05
  • Does this answer your question? [How to speed up bulk insert to MS SQL Server using pyodbc](https://stackoverflow.com/questions/29638136/how-to-speed-up-bulk-insert-to-ms-sql-server-using-pyodbc) – Charlieface Dec 02 '21 at 20:18
  • Kind of. I ended up figuring it out and updated the question. I appreciate your help and time. – dan Dec 04 '21 at 22:39

1 Answers1

0

Don't use for or cursors just SQL

insert into TABLENAMEA (A,B,C,D) 
select A,B,C,D from TABLENAMEB

Take a look to this link to see another demo: https://www.sqlservertutorial.net/sql-server-basics/sql-server-insert-into-select/

You just need to update this part to run a normal insert

conn = pyodbc.connect ("Driver={xxx};"
            "Server=xxx;"
            "Database=xxx;"
            "Trusted_Connection=yes;")
cursor = conn.cursor()
cursor.execute('insert into TABLENAMEA (A,B,C,D) select A,B,C,D from TABLENAMEB') 

You don't need to store the dataset in a variable, just run the query directly as normal SQL, performance will be better than a iteration

Enrique Flores
  • 716
  • 4
  • 13