I am currently using SQLAlchemy to write a pandas dataframe to a postgresql database on an AWS server. My code looks like this
engine = create_engine(
'postgresql://{}:{}@{}:{}/{}'.format(ModelData.user, ModelData.password, ModelData.host, ModelData.port,
ModelData.database), echo=True)
with open(file, 'rb') as f:
df = pickle.load(f)
df.to_sql(table_name, engine, method='multi', if_exists='replace', index=False, chunksize=1000)
The table I am writing has about 900,000 rows and 500 columns. It takes quite a long time to complete. Is there a faster way to write this data? Sometimes I will wait all day and still not be complete. To reiterate, this post is about speed and not about execution. Any help would be appreciated!
Note: The machine I'm using has 32 GB of RAM, i7 processor, 1 TB storage, and a GPU so I don't think it's my machine.