I'm trying to write 300,000 rows to a postgresql database with pandas.to_sql and SQLalchemy. The rows contain some JSON, but mainly String columns (~25 columns total).
Current implementation takes ~17 seconds for every 5k rows, or ~1200 seconds for 300k rows. Is there a way to improve performance here? Current implementation:
db = create_engine(pg_key, executemany_mode="batch", executemany_batch_page_size=2500)
df = pd.read_csv('data.csv')
df.to_sql(
'table_name',
con=db,
index=False,
dtype={'column_a': JSON(), 'column_b': JSON()},
if_exists='replace'
)