2

I am trying to speed up an import a Pandas DataFrame of shape 2.000.000 x 6 to a Postgres DB.

I am using:

engine = create_engine('postgresql://root:password@redshift_postgres.redshift.amazonaws.com:5439/dev')
df.to_sql('table', 
          engine,
         if_exists='replace',
         schema='schema_1',
         index=False,
         method='multi',
         chunksize=10000,
         dtype={'column_1': sqlalchemy.types.VARCHAR(length=30),
                'column_2': sqlalchemy.types.VARCHAR(length=30),
                'column_3': sqlalchemy.types.VARCHAR(length=200),
                'column_4':sqlalchemy.Date()})                                                                                                  

And it is taking more than 20 minutes already which I suppose shouldn't. I've searched for methods to faster the process and this seems the correct way. I am not sure if I am applying the method correctly:

from sqlalchemy import event
@event.listens_for(engine, "before_cursor_execute")
def receive_before_cursor_execute(
       conn, cursor, statement, params, context, executemany
        ):
            if executemany:
                cursor.fast_executemany = True

engine = create_engine('postgresql://root:password@redshift_postgres.redshift.amazonaws.com:5439/dev')
df.to_sql('table', 
          engine,
         if_exists='replace',
         schema='schema_1',
         index=False,
         dtype={'column_1': sqlalchemy.types.VARCHAR(length=30),
                'column_2': sqlalchemy.types.VARCHAR(length=30),
                'column_3': sqlalchemy.types.VARCHAR(length=200),
                'column_4':sqlalchemy.Date()}) 

Basically just adding the decorator and the function and removing method=multi and chunksize=10000 from my previously written code.

Is there something trivial I am missing because it does not speed up the import at all?

Jonas Palačionis
  • 4,591
  • 4
  • 22
  • 55

0 Answers0