Pandas to_sql with sqlalchemy : how to speed up insert rows to SQL Server for very large dataframes?

Question

I have a dataframe which contains 59 million rows. I want to upload it to our SQL Server database but it keeps timing out. What is the best way to upload very large pandas dataframes to SQL Server?

Thanks in advance!

Here is my code:


import pandas as pd
import requests
import pyodbc
import sys
import os
import urllib
from sqlalchemy import create_engine
import sqlalchemy as sa
from urllib.request import Request, urlopen

def sql_connect(account,password):

        params = urllib.parse.quote_plus("DRIVER=SQL Server;"
                                    "SERVER=servername;"
                                    "DATABASE=databasename;"
                                    "UID={};"
                                    "PWD={}".format(account,password))

         
        engine = sa.create_engine("mssql+pyodbc:///?odbc_connect={}".format(params),fast_executemany=True)
        return engine

def push_new_data(self,engine,df_to_push,formatDict):

         
        if not df_to_push.empty:
            df_to_push.to_sql(name=self.table_name,
                        con=engine,
                        if_exists = 'append',
                        chunksize=20,
                        index=False,  
                        method='multi', 
                        dtype=formatDict)   
        else:
            print('No new data was added')

Based on the answers given here --> python pandas to_sql with sqlalchemy : how to speed up exporting to MS SQL? I tried to implement the fast_executemany=True to speed things up but reading the documentation I saw the following warning:

The pyodbc fast_executemany mode buffers all rows in memory and is not compatible with very large batches of data. A future version of SQLAlchemy may support this flag as a per-execution option instead.

Which makes me think it might not be appropriate for my purposes.

AFAIK pyodbc doesn't support the bulk insert TDS path. So dump the whole thing to a CSV on the server and upload it using a manual `BULK INSERT` command — Charlieface, Jul 27 '21 at 10:27
You could break up your inserts into chunks of maybe 100,000 (assuming this is only something you are doing once). This would add some complexity in that you would have to take care if there was a failure somewhere during the process. There's a package on PyPI that is a wrapper for the SQL Server bulk copy here: https://pypi.org/project/bcpyaz/ — Jason Cook, Jul 27 '21 at 10:46

Pandas to_sql with sqlalchemy : how to speed up insert rows to SQL Server for very large dataframes?

0 Answers0