(ANSWERED) My answer to this is down below hope it helps.
I am quite new to SQLAlchemy and Python as a whole and I am looking for some advise. I am looking at moving data from one Postgres DB to another Postgres DB. I am about moving 20M+ records and I would like the job to run daily. I would like to know:
- Should I use SQLAlchemy core or the ORM? (I have mostly used the core so far)
- I am currently using SQLAlchemy version '1.3.23' (should I move to 1.4/2.x)?
- How do I optimize the insert to run faster? (I heard that there might be flags I need to enable?)
I unfortunately can't use pyscopg2 Copy function because I do not have SuperUser access to the DB.
I am trying to follow someone else's stack overflow example: the example i am following
q = select([table_1])
proxy = conn_p.execution_options(stream_results=True).execute(q)
while 'batch not empty': # equivalent of 'while True', but clearer
batch = proxy.fetchmany(100000) # 100,000 rows at a time
if not batch:
break
for row in batch:
???
proxy.close()
The part that I get stuck on is in the for loop. How do I write the data to the next db? What function/s should I use?
Is this the best approach or have I gone horribly wrong?
My current iteration of code using version 1.4:
conn_p = create_engine(--db connection string--, echo=True)
conn_sl = create_engine(--db connection string--, echo=False)
q = select([table_1])
proxy = conn_p.execution_options(stream_results=True).execute(q)
while 'batch not empty':
batch = proxy.fetchmany(10000)
list1 = []
if not batch:
break
for row in batch:
d = dict(row.items())
list1.append(d)
insert_stmt = table_2.insert().values(list1)
conn_sl.execute(insert_stmt)
proxy.close()
Still very slow, it takes about 15 seconds to move 10k records. Any advise?