1

I'm trying to write 300,000 rows to a postgresql database with pandas.to_sql and SQLalchemy. The rows contain some JSON, but mainly String columns (~25 columns total).

Current implementation takes ~17 seconds for every 5k rows, or ~1200 seconds for 300k rows. Is there a way to improve performance here? Current implementation:

db = create_engine(pg_key, executemany_mode="batch", executemany_batch_page_size=2500)

df = pd.read_csv('data.csv')

df.to_sql(
    'table_name', 
    con=db, 
    index=False,
    dtype={'column_a': JSON(), 'column_b': JSON()},
    if_exists='replace'
)
L M
  • 41
  • 5
  • what database flavour? – sammywemmy Aug 15 '21 at 23:03
  • Does this answer your question? [Bulk Insert A Pandas DataFrame Using SQLAlchemy](https://stackoverflow.com/questions/31997859/bulk-insert-a-pandas-dataframe-using-sqlalchemy) – Code Different Aug 15 '21 at 23:39
  • Thanks both, the DB is Postgres. Re the LinkedIn question, it doesn't use to_sql which is what I'm hoping to use. Has anyone had any luck with speeding up to to_sql? – L M Aug 18 '21 at 11:12

0 Answers0