I am trying to use Pandas' to_sql method to upload multiple csv files to their respective table in a SQL Server database by looping through them.
fileLoc = r'C:\Users\hcole\Downloads\stats.csv\\'
dfArray = ['file1', 'file2', 'file3', 'file4']
for name in dfArray:
df = pd.read_csv(fileLoc + name + '.csv')
df.columns = df.columns.str.replace(' ', '')
df.to_sql(name, engine, if_exists = 'append', index = False)
My connection string and database connection is working fine; I make it through the first few (and small) files just fine. But as soon as I hit file4, which contains ~135k rows, it takes nearly an hour to upload all of the data to the database. I've tried downgrading to Pandas version 0.22 after reading documentation on the "chunksize" argument of to_sql, but have had no luck with that speeding up the process.
Any tips on how to improve the speed would be appreciated. Thank you.