1

I would like to insert records into table_a from table_b that don't already exist in table table_a. I already have Postgres SQL code to do this, but now my team has requested that I use an ORM (SQLAlchemy) instead.

INSERT INTO table_a
SELECT
  composite_pk1,
  composite_pk2,
  col_c,
  col_d
FROM table_b
ON CONFLICT (
  composite_pk1,
  composite_pk2
) DO NOTHING

I have nearly a million rows and about 15 columns (not shown in the example). I need this query to be fast, which is why I don't think the solution posted here will work for my use case.

For performance reasons I also want to avoid treating my Python function as a data conduit. I don't want to transfer many rows of table_b over the network to my function just to push them back over the network again to table_a. That is, I would prefer the insert to happen entirely on Postgres, which I already accomplish with my original SQL query.

user554481
  • 1,875
  • 4
  • 26
  • 47
  • I have no idea of SQLAlchemy.. but supposing an insert with rows from a query is possible (without ```ON CONFLICT```), I could write a query with a join to accomplish this (since it is ```DO NOTHING```, for updating this would not work, I guess) – Islingre Sep 13 '19 at 19:43

1 Answers1

4

Probably the fastest way to perform an upsert with the usage of SQLAlchemy ORM is through bulk_update_mappings function, that allows you to upsert merely based on a list of dicts.

But the situation you are describing isn't really an upsert - you want to insert rows, and if there is a conflict - do nothing. No update is being done here, therefore it is a simple insert.

To perform an insert that skips any conflicts is a simple thing in SQLAlchemy (assuming you have your table already defined as a model):

from sqlalchemy.dialects.postgresql import insert
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()
engine = create_engine('your_db_connection_string', echo=True)
Session = sessionmaker(bind=engine)
session = Session()

# example column names
data = [{'col1': result.col1, 'col2': result.col2} 
        for result in session.query(table_b).all()]
insert_query = insert(table_a).values(data).on_conflict_do_nothing()

session.execute(insert_query)
session.commit()
session.close()
Epion
  • 458
  • 3
  • 7
  • Is there a way to accomplish this without pulling any data out of the database? It looks like the `for` loop you wrote is pulling data from the database into the Python process. The original SQL query I wrote keeps all data on the database. There is a lot of data to be inserted and the operation will be slow if I have to transfer it across the network and back again. – user554481 Sep 13 '19 at 20:46
  • Instead of `values()` use `from_select()`, as described here: https://stackoverflow.com/questions/1849375/how-do-i-insert-into-t1-select-from-t2-in-sqlalchemy (at the bottom of the answer). – Ilja Everilä Sep 14 '19 at 04:09
  • Correct, it is first pulling the data and then performing an insert. I haven't used the insert `from_select()` syntax pointed out by Ilja, but it looks promising. Let us know how it worked out for you. – Epion Sep 14 '19 at 08:34
  • How can we do bulk insert with the on_conflict_do_nothing statement? – Korenz May 06 '21 at 10:26