0

I am a newby to SQL and data management, your help is greatly appreciated.

Task:

  1. Extract from API vast amounts of data into Python DataFrame
  2. Handle some data errors
  3. Send in its entirety to SQL server using MS SQL Server Management Studio

Currently, I am having to .format every row of the dataframe individually to a SQL statement within a string and then execute it in SQL. As explained on the MS website:

https://learn.microsoft.com/en-us/sql/machine-learning/data-exploration/python-dataframe-sql-server?view=sql-server-ver15

This is extremely slow and is very prone to error due to the number of columns.

e.g.

sqlcreateconnection()
for page in api pages:
  for item in pages:
       sqlStatementList.append("exec [dbo].[SP_Transactions] '{1}','{2}','{3}','{4}','{5}','{6}','{7}'...

.format(col1,col2,col3....

sql_executelist(sqlStatementList)

Is there a faster way?

Thanks!!

rpatt97
  • 21
  • 2
  • 1
    The fasters option is to save the data to a CSV and insert it using BULK INSERT. You can execute the command directly or use the [bcpy](https://github.com/titan550/bcpy) package. This will import the data using a minimally logged operation which is far faster than even a fast cursor. Second option is to use [DataFrame.to_sql](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html) with SQLAlchemy and [the fast_executemany option](https://stackoverflow.com/questions/48006551/speeding-up-pandas-dataframe-to-sql-with-fast-executemany-of-pyodbc). – Panagiotis Kanavos Oct 14 '20 at 12:11
  • Hi Panagiotis, thank you for your fast and well informed response. I will give them both a try. – rpatt97 Oct 15 '20 at 09:45

0 Answers0