I have a dataframe which contains 59 million rows. I want to upload it to our SQL Server database but it keeps timing out. What is the best way to upload very large pandas dataframes to SQL Server?
Thanks in advance!
Here is my code:
import pandas as pd
import requests
import pyodbc
import sys
import os
import urllib
from sqlalchemy import create_engine
import sqlalchemy as sa
from urllib.request import Request, urlopen
def sql_connect(account,password):
params = urllib.parse.quote_plus("DRIVER=SQL Server;"
"SERVER=servername;"
"DATABASE=databasename;"
"UID={};"
"PWD={}".format(account,password))
engine = sa.create_engine("mssql+pyodbc:///?odbc_connect={}".format(params),fast_executemany=True)
return engine
def push_new_data(self,engine,df_to_push,formatDict):
if not df_to_push.empty:
df_to_push.to_sql(name=self.table_name,
con=engine,
if_exists = 'append',
chunksize=20,
index=False,
method='multi',
dtype=formatDict)
else:
print('No new data was added')
Based on the answers given here --> python pandas to_sql with sqlalchemy : how to speed up exporting to MS SQL? I tried to implement the fast_executemany=True
to speed things up but reading the documentation I saw the following warning:
The pyodbc fast_executemany mode buffers all rows in memory and is not compatible with very large batches of data. A future version of SQLAlchemy may support this flag as a per-execution option instead.
Which makes me think it might not be appropriate for my purposes.