I am trying to speed up an import a Pandas DataFrame of shape 2.000.000 x 6
to a Postgres DB.
I am using:
engine = create_engine('postgresql://root:password@redshift_postgres.redshift.amazonaws.com:5439/dev')
df.to_sql('table',
engine,
if_exists='replace',
schema='schema_1',
index=False,
method='multi',
chunksize=10000,
dtype={'column_1': sqlalchemy.types.VARCHAR(length=30),
'column_2': sqlalchemy.types.VARCHAR(length=30),
'column_3': sqlalchemy.types.VARCHAR(length=200),
'column_4':sqlalchemy.Date()})
And it is taking more than 20 minutes already which I suppose shouldn't. I've searched for methods to faster the process and this seems the correct way. I am not sure if I am applying the method correctly:
from sqlalchemy import event
@event.listens_for(engine, "before_cursor_execute")
def receive_before_cursor_execute(
conn, cursor, statement, params, context, executemany
):
if executemany:
cursor.fast_executemany = True
engine = create_engine('postgresql://root:password@redshift_postgres.redshift.amazonaws.com:5439/dev')
df.to_sql('table',
engine,
if_exists='replace',
schema='schema_1',
index=False,
dtype={'column_1': sqlalchemy.types.VARCHAR(length=30),
'column_2': sqlalchemy.types.VARCHAR(length=30),
'column_3': sqlalchemy.types.VARCHAR(length=200),
'column_4':sqlalchemy.Date()})
Basically just adding the decorator and the function and removing method=multi
and chunksize=10000
from my previously written code.
Is there something trivial I am missing because it does not speed up the import at all?