How to distinguish conflict in sqlalchemy INSERT…ON CONFLICT (Upsert)

Question

I was reading the documentation of SqlAlchemy regarding the upsert operation on postgres dialect at http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html#insert-on-conflict-upsert

Is there a way to know if the upsert has been an insert or an update?

The documentation seems to omit this detail.

is the insert on conflict inserting/updating a single row at a time, or rows in batches? — Haleemur Ali, Jul 09 '18 at 14:21
I think when you `execute` an upsert statement the returned value is of type `ResultProxy` with this you can use `is_insert` method to check if the `execute`d statement has inserted data or not - [RESULT_PROXY](http://docs.sqlalchemy.org/en/latest/core/connections.html#sqlalchemy.engine.ResultProxy). I am not able to confirm this as i dont use postgress database. — gyx-hh, Jul 09 '18 at 15:11
I saw the `ResultProxy.is_insert()` but I am not sure if it applies to the `postgresql.dml.insert()` — Sanandrea, Jul 09 '18 at 15:16
@Sanandrea, `is_insert` is a property not a method (so you can't call it), and `res_proxy.is_insert` will return true if the statement is an insert statement, so unfortunately will not help for this case, as it will return `true` even if an update is performed. — Haleemur Ali, Jul 09 '18 at 15:23
@HaleemurAli thank you, just a lapsus on the property, but I cannot modify the comment. — Sanandrea, Jul 09 '18 at 15:34

score 4 · Answer 1 · edited Jun 20 '20 at 09:12

From the postgresql documentation on insert:

On successful completion, an INSERT command returns a command tag of the form
INSERT oid count
The count is the number of rows inserted or updated. If count is exactly one, and the target table has OIDs, then oid is the OID assigned to the inserted row. The single row must have been inserted rather than updated. Otherwise oid is zero.

Thus it is possible to check for updates vs inserts by examining the query message

A table can be created with OIDs using syntax

CREATE TABLE mytable (...) WITH OIDS

, or OIDs may be enabled on an existing table with syntax

ALTER TABLE mytable SET WITH OIDS

Using sqlalchemy, a table may be created using OIDs as follows:

import sqlalchemy as sa
from sqlalchemy.dialects.postgresql import insert as pg_insert
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
engine = sa.create_engine('postgresql+psycopg2://user:pass@hostname:port/db')

class Person(Base):
    __tablename__ = 'people'
    __table_args__ = {'postgresql_with_oids': True}
    email = sa.Column(sa.Text, primary_key=True)
    name = sa.Column(sa.Text, nullable=False)

Base.metadata.create_all(engine)

And the insert on conflict statement may be constructed like this:

p = {'email': 'hal@hal.hal', 'name': 'Hally hal'}
stmt = pg_insert(Person).values(p)
stmt = stmt.on_conflict_do_update(
    index_elements = [Person.email],
    set_ = {'name': stmt.excluded.name}
)

finally, once the statement is executed, a result proxy is returned, which has a property lastrowid that corresponds to the oid in the query message INSERT oid count

Thus, if you execute stmt the first time

r = engine.execute(stmt)

r.lastrowid will output an integer > 0 since a new row is created

Every other time, r.lastrowid will output 0.

If you need to track upsert of multiple rows at a time, you can set up extra columns as flags that are updated from the on conflict do update portion of your insert statement.

There are many ways to do this, depending on the exact requirements. Here's is 1 alternative.

add an extra column conflict_updated_at = sa.Column(sa.Datetime(True))

change the upsert defintion to

stmt = pg_insert(Person).values(p)
stmt = stmt.on_conflict_do_update(
    index_elements = [Person.email],
    set_ = {'name': stmt.excluded.name,
            'conflict_updated_at': sa.func.now()}
)

thank you for the answer! Do you know a way by any chance if the `on_conflict_do_update` code can be somehow tested with sqlite dialect? My unit tests are all in sqlite and up to now I have used only standard SQL expression. I have used `with_variant` on some data types, but do not know a way to do it with operations. — Sanandrea, Jul 09 '18 at 15:36
it wouldn't work as sqlite doesn't have an upsert mechanism as postgresql does. for this test, your only route is to use a postgresql database. Here's a great answer that deals with configuring postgresql for such usage: https://stackoverflow.com/questions/9407442/optimise-postgresql-for-fast-testing/9407940#9407940 — Haleemur Ali, Jul 09 '18 at 16:11

Erwin Brandstetter · Accepted Answer · 2018-07-09T17:21:29.303

4

Just add a final RETURNING clause with:

...
RETURNING (tbl.xmax = 0) AS inserted

Returns true for inserted rows and false for updated rows. This relies on an undocumented implementation detail, though. For a detailed explanation see:

Adding OIDs (like suggested in another answer) adds cost, bloats the table and burns OIDs (if your table isn't trivially small). That's why the general setting of default_with_oids has been changed to off a long time ago (Postgres 8.1). Quoting the manual:

The use of OIDs in user tables is considered deprecated, so most installations should leave this variable disabled. Applications that require OIDs for a particular table should specify WITH OIDS when creating the table.

edited Jul 09 '18 at 17:21

answered Jul 09 '18 at 16:13

Erwin Brandstetter

605,456
145
1,078
1,228

Thank you, I am accepting yours as (apart from the deprecation reason and bloating) the effort is way lower than migrating the table schema. – Sanandrea Jul 09 '18 at 16:15
Do note that it builds on an implementation detail that may change in future releases (*after* pg 11). – Erwin Brandstetter Jul 09 '18 at 16:20
1

*undocumented implementation detail* is this safe for production? This seems more like a hack at a first glance. – Sanandrea Jul 09 '18 at 16:20
It works reliably in all current versions, but while it's not documented, it may change in future major versions (though unlikely). – Erwin Brandstetter Jul 09 '18 at 16:21
I see, ...together with the fact that is postgresql only functionality, I might as well not use it at all in the first place since all the unit tests in the project are on sqlite. – Sanandrea Jul 09 '18 at 16:24
2

@Sanandrea: Aside: It's a fundamentally flawed idea to test projects supposed to run on Postgres with SQLite. Related: https://stackoverflow.com/a/11278943/939860 – Erwin Brandstetter Jul 09 '18 at 16:27
my experience going along with this unfortunate setup is the same as what you are suggesting, but there is still the fear that once we abandon the standard SQL expressions we might not be RDBMS agnostic from the application point of view. Thank you again! – Sanandrea Jul 09 '18 at 16:33
1

Nitpicking: Being "RDBMS agnostic" is a false goal. Impossible, even, since there is no RDBMS in existence that's even close to the SQL standard. (And the standard has pretty weak spots.) Postgres is probably the closest among the big ones. Still a valid idea to prefer SQL standard ways where possible. OK, we are dragging this out. I think we both got the point. – Erwin Brandstetter Jul 09 '18 at 16:37

How to distinguish conflict in sqlalchemy INSERT…ON CONFLICT (Upsert)

2 Answers2