I am creating a database from three .csv
files I receive daily. I cleaned them with pandas and now each of them are in a respective pandas dataframe. The largest is about 100,000 by 500 and the others are similar in size. I need the fastest way to upload each dataframe to 3 different tables in postgresql
. This really needs to be fast because I also have to upload about 10 years worth of data.
I have used sqlalchemy
and psycopg2
with df.to_sql
as well as converting the dataframe to a csv and uploading in batch. Uploading a csv was the fastest but would yield constant type errors and would crash. sqlalchemy
and psycopg2
worked perfectly but just take about an hour or more for all three of them to run. Chunksize, fastmany, and multi keywords all yielded about the same speed. I was looking at using asyncpg but I am confused on how to implement it.
Please, any help or advice is greatly appreciated.