Long story made short, ORM overhead is killing your performance.
When you're not manipulating the objects involved, it's better to use SQA Core expressions ("SQL Expressions") which are almost as fast as pure SQL.
Solution:
Of course I'm presuming your MySQL and Postgres models have been meticulously synchronized (i.e. values from an object from MySQL are not a problem for creating object in Postgres model and vice versa).
Overview:
- get
Table
objects out of declarative classes
select
(SQLAlchemy Expression) from one database
- convert rows to
dict
s
insert
into the other database
More or less:
# get tables
m_table = ItemMySQL.__table__
pg_table = ItemPG.__table__
# SQL Expression that gets a range of rows quickly
pg_q = select([pg_table]).where(
and_(
pg_table.c.id >= id_start,
pg_table.c.id <= id_end,
))
# get PG DB rows
eng_pg = DBSessionPG.get_bind()
conn_pg = eng_pg.connect()
result = conn_pg.execute(pg_q)
rows_pg = result.fetchall()
for row_pg in rows_pg:
# convert PG row object into dict
value_d = dict(row_pg)
# insert into MySQL
m_table.insert().values(**value_d)
# close row proxy object and connection, else suffer leaks
result.close()
conn_pg.close()
Background on performance, see accepted answer (by SQA principal author himself):
Why is SQLAlchemy insert with sqlite 25 times slower than using sqlite3 directly?
Since you seem to have Python crashing, perhaps you're using too much memory? Hence I suggest reading and writing rows in batches.
A further improvement could be using .values
to insert a number of rows in one call, see here: http://docs.sqlalchemy.org/en/latest/core/tutorial.html#inserts-and-updates