2

I have been trying to insert data from a dataframe in Python to a table already created in SQL Server. The data frame has 90K rows and wanted the best possible way to quickly insert data in the table. I only have read,write and delete permissions for the server and I cannot create any table on the server.

Below is the code which is inserting the data but it is very slow. Please advise.

import pandas as pd
import xlsxwriter
import pyodbc

df = pd.read_excel(r"Url path\abc.xlsx")
conn = pyodbc.connect('Driver={ODBC Driver 11 for SQL Server};'
                      'SERVER=Server Name;'
                      'Database=Database Name;'
                      'UID=User ID;'
                      'PWD=Password;'
                      'Trusted_Connection=no;')
cursor= conn.cursor()
#Deleting existing data in SQL Table:- 
cursor.execute("DELETE FROM datbase.schema.TableName")
conn.commit()
#Inserting data in SQL Table:- 
for index,row in df.iterrows():
    cursor.execute("INSERT INTO Table Name([A],[B],[C],) values (?,?,?)", row['A'],row['B'],row['C']) 
conn.commit()
cursor.close()
conn.close()
GBouffard
  • 1,125
  • 4
  • 11
  • 24
Nipun Rai
  • 21
  • 1
  • 1
  • 4
  • 1
    There is a `pandas.to_sql` method that might be helpful for you: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html – Timbolt Aug 21 '20 at 13:33
  • @GBouffard I tried using the to_sql method, But It says the error that I cannot create table in the Database. That's why I actually was specifying I don't have access to creating table but I can read, delete or insert. – Nipun Rai Aug 21 '20 at 14:00
  • If you provide the right arguments to the method, (dtypes and if_exists='append') then it will not try to create the table. – Jonathan Nappee Feb 21 '23 at 16:29

5 Answers5

3

To insert data much faster, try using sqlalchemy and df.to_sql. This requires you to create an engine using sqlalchemy, and to make things faster use the option fast_executemany=True

connect_string = urllib.parse.quote_plus(f'DRIVER={{ODBC Driver 11 for SQL Server}};Server=<Server Name>,<port>;Database=<Database name>')
engine = sqlalchemy.create_engine(f'mssql+pyodbc:///?odbc_connect={connect_string}', fast_executemany=True)

with engine.connect() as connection:
  df.to_sql(<table name>, connection, index=False)
NYC Coder
  • 7,424
  • 2
  • 11
  • 24
  • Hi I tried using above method as well but the speed is still very slow. @ NYC Coder – Nipun Rai Aug 24 '20 at 11:47
  • Consider adding the following options to the to_sql call: if_exists='append', method='multi', chunksize=500 The chunk size will depend on how large or numerous your values are. With this I managed to write 20K rows similar to your data in 7 seconds. – Jonathan Nappee Feb 21 '23 at 16:18
1

Here is the script and hope this works for you.

import pandas as pd    
import pyodbc as pc    
connection_string = "Driver=SQL Server;Server=localhost;Database={0};Trusted_Connection=Yes;"                           
cnxn = pc.connect(connection_string.format("DataBaseNameHere"), autocommit=True)    
cur=cnxn.cursor()
df= pd.read_csv("your_filepath_and_filename_here.csv").fillna('')    
query = 'insert into TableName({0}) values ({1})'    
query = query.format(','.join(df.columns), ','.join('?' * len(df1.columns)))        
cur.fast_executemany = True    
cur.executemany(query, df.values.tolist())
cnxn.close()
Jeremy Caney
  • 7,102
  • 69
  • 48
  • 77
  • 2
    As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jan 07 '22 at 00:40
  • This is a better answer than the iterrows one but the to_sql in my tests are about 100 times faster although they do require a bit more code. – Jonathan Nappee Feb 21 '23 at 16:58
  • what is df1 here? How we are getting values ? Please explain little bit. – SAGY Mar 21 '23 at 05:18
-1

This should do what you want...very generic example...

# Insert from dataframe to table in SQL Server
import time
import pandas as pd
import pyodbc

# create timer
start_time = time.time()
from sqlalchemy import create_engine


df = pd.read_csv("C:\\your_path\\CSV1.csv")

conn_str = (
    r'DRIVER={SQL Server Native Client 11.0};'
    r'SERVER=Excel-PC\SQLEXPRESS;'
    r'DATABASE=NORTHWND;'
    r'Trusted_Connection=yes;'
)
cnxn = pyodbc.connect(conn_str)

cursor = cnxn.cursor()

for index,row in df.iterrows():
    cursor.execute('INSERT INTO dbo.Table_1([Name],[Address],[Age],[Work]) values (?,?,?,?)', 
                    row['Name'], 
                    row['Address'], 
                    row['Age'],
                    row['Work'])
    cnxn.commit()
cursor.close()
cnxn.close()

# see total time to do insert
print("%s seconds ---" % (time.time() - start_time))

Try that and post back if you have additional questions/issues/concerns.

ASH
  • 20,759
  • 19
  • 87
  • 200
  • The above code is giving me the same speed. It allows pasting 70 rows in 60 seconds. I need to paste 1 lakh rows which will take more than a day. – Nipun Rai Aug 27 '20 at 10:33
  • It's a cursor so it's going line by line by line by line. It does an insert for each and every line. Take a look at this. https://stackoverflow.com/questions/40060864/speed-up-inserts-into-sql-server-from-pyodbc – ASH Aug 27 '20 at 12:27
  • If that doesn't work for you, maybe you need to re-think your solution. For instance, Databricks can insert, at least, hundreds of thousands, or maybe millions, of records per second, depending on the number of columns you are working with. This is not a free service, and it takes a little knowledge/education to get up to speed with things. – ASH Aug 27 '20 at 12:33
-1

Replace df.iterrows() with df.apply() for one thing. Remove the loop for something much more efficient.

Matt
  • 159
  • 1
  • 8
  • You're assuming the iterrows method is the issue here but if you had written a benchmark you'd see it's negligible compared to the insert. – Jonathan Nappee Feb 21 '23 at 16:02
-2

Try to populate a temp table with 1 or none indexes then insert it into your good table all at once. Might speed things up due to not having to update the indexes after each insert??