Working with a large pandas DataFrame that needs to be dumped into a PostgreSQL table. From what I've read it's not a good idea to dump all at once, (and I was locking up the db) rather use the chunksize
parameter. The answers here are helpful for workflow, but I'm just asking about the value of chunksize affecting performance.
In [5]: df.shape
Out[5]: (24594591, 4)
In [6]: df.to_sql('existing_table',
con=engine,
index=False,
if_exists='append',
chunksize=10000)
Is there a recommended default and is there a difference in performance when setting the parameter higher or lower? Assuming I have the memory to support a larger chunksize, will it execute faster?