1

I'm trying to find a better way to push data to sql db using python. I have tried dataframe.to_sql() method and cursor.fast_executemany() but they don't seem to increase the speed with that data(the data is in csv files) i'm working with right now. Someone suggested that i could use named tuples and generators to load data much faster than pandas can do.

[Generally the csv files are atleast 1GB in size and it takes around 10-17 minutes to push one file]

I'm fairly new to much of concepts of python,so please suggest some method or atleast a reference any article that shows any info. Thanks in advance

dguy73
  • 71
  • 7
  • https://stackoverflow.com/questions/48006551/speeding-up-pandas-dataframe-to-sql-with-fast-executemany-of-pyodbc – mootmoot Apr 25 '19 at 15:48
  • When you say "sql database" do you mean Microsoft SQL Server? If so, are you using Microsoft's "ODBC Driver 17 for SQL Server"? – Gord Thompson Apr 25 '19 at 16:20
  • yes my bad i forgot to mention it, it's for SQL server 17 using **driver:"SQL Server Native Client 11.0"** – dguy73 Apr 25 '19 at 16:36
  • re: "fast_executemany ... [doesn't] seem to increase the speed" - [This pyodbc issue](https://github.com/mkleehammer/pyodbc/issues/500) may be of interest. – Gord Thompson Apr 25 '19 at 19:02

1 Answers1

0

If you are trying to insert the csv as is into the database (i.e. without doing any processing in pandas), you could use sqlalchemy in python to execute a "BULK INSERT [params, file, etc.]". Alternatively, I've found that reading the csvs, processing, writing to csv, and then bulk inserting can be an option.

Otherwise, feel free to specify a bit more what you want to accomplish, how you need to process the data before inserting to the db, etc.

peels
  • 91
  • 5