0

I am attempting to load a temporary table with two fields for a later update join to update about 100K records.

I attempted to use ORM methods but it was entirely too slow so I switched to CORE. However, it now runs and doesn't finish in an hour. I suspect I am missing something basic. Is there something I need to do to trigger the temp table creation prior to the insert?

        ## CREATE AND POPULATE TABLE WITH CACHED CHANGES
        toCostCode = Table('cache_tbl', metadata,
                                 Column('id', Text, primary_key=True),
                                 Column('new_cost_code', Integer),
                                 prefixes=["TEMPORARY"] )
        toCostCode.create(engine)
        logger.syslog('info', "mark - created temp table")
        engine.execute(toCostCode.insert(), toCostCodeCache)
        logger.syslog('info', "mart - populated temp table")

It basically reports 107K records will be updated and then sits on engine.execute indefinitely. I can generate the same size table in under 1 second with

        drop table t_random;
        create table t_random as select md5(random()::text) "id", s 
               "new_cost_code" from generate_Series(1,100000) s;

How can I make SQLALCHEMY perform and is there a good URL/RTFM on this as I am finding this SQL package much harder than I encountered in other languages? And my GoogleFU has failed me to this point.

Ilja Everilä
  • 50,538
  • 7
  • 126
  • 127
  • 100k shouldn't take that long, so perhaps something's amiss, but just in case you could try using `COPY`: https://stackoverflow.com/questions/56761442/sqlalchemy-set-membership-for-very-large-sets/56781328, https://stackoverflow.com/questions/13125236/sqlalchemy-psycopg2-and-postgresql-copy, https://stackoverflow.com/questions/43317376/how-can-i-use-psycopg2-extras-in-sqlalchemy. Also have you tried enabling batch mode: https://docs.sqlalchemy.org/en/13/dialects/postgresql.html#psycopg2-batch-mode-fast-execution – Ilja Everilä Aug 04 '19 at 07:17
  • In addition, are you using a local DB or remote? The default DB-API library used by SQLAlchemy for Postgresql is psycopg2. Its `executemany()` implementation is slow, essentially a loop of separate `INSERT` statements, so network lag might become very noticeable with a 100k inserts. – Ilja Everilä Aug 04 '19 at 07:33

0 Answers0