0

I am creating a database from three .csv files I receive daily. I cleaned them with pandas and now each of them are in a respective pandas dataframe. The largest is about 100,000 by 500 and the others are similar in size. I need the fastest way to upload each dataframe to 3 different tables in postgresql. This really needs to be fast because I also have to upload about 10 years worth of data.

I have used sqlalchemy and psycopg2 with df.to_sql as well as converting the dataframe to a csv and uploading in batch. Uploading a csv was the fastest but would yield constant type errors and would crash. sqlalchemy and psycopg2 worked perfectly but just take about an hour or more for all three of them to run. Chunksize, fastmany, and multi keywords all yielded about the same speed. I was looking at using asyncpg but I am confused on how to implement it.

Please, any help or advice is greatly appreciated.

Lumos
  • 570
  • 1
  • 11
  • 24
rte
  • 49
  • 10
  • 3
    Maybe a simple export to an sql dump (a file with a large single sql insert expression per file) help. From `` you could use `psql` and multiple processors to import such a dump into your database. psql should be the fastest way possible, imho. – Drey Jul 22 '19 at 15:57
  • 1
    do you mean you have 500 columns? if that's true, you might think about normalising your data before putting into the database. PG certainly supports that, but it's a long way from recommended usage – Sam Mason Jul 22 '19 at 17:36
  • 1
    also just eyeballing the numbers, 100k rows * 4000 days * 4kb (at best, if you've got 500 columns) per row is 1.5TB, means you're probably at the point it's probably worth partitioning the table maybe by year – Sam Mason Jul 22 '19 at 17:41
  • This answer to another post helped a great deal [link](https://stackoverflow.com/a/47984180/5476158) – rte Jul 22 '19 at 17:50
  • @SamManson Yes I have about 500 columns in each file. Unfortunately, all are necessary. But your suggestion about partitioning by year is very helpful and one I had not considered. Thank you! – rte Jul 22 '19 at 17:55

0 Answers0