1

I'm trying to insert a row if the same primary key does not exist yet (ignore in that case). Doing this from Python, using psycopg2 and Postgres version 9.3.

There are several options how to do this: 1) use subselect, 2) use transaction, 3) let it fail.

It seems easiest to do something like this:

try:
    cursor.execute('INSERT...')
except psycopg2.IntegrityError:
    pass

Are there any drawbacks to this approach? Is there any performance penalty with the failure?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
tadeas
  • 189
  • 1
  • 6
  • 1
    Good question. In my experience a subselect with `NOT EXISTS` is cheapest (while only 99.99 % safe against race conditions), but I cannot pin numbers on the cost of raising an exception instead. It also largely depends on the rate of key collisions. – Erwin Brandstetter Dec 22 '14 at 19:28

2 Answers2

1

The foolproof way to do it at the moment is try the insert and let it fail. You can do that at the app level or at the Postgres level; assuming it's not part of a procedure being executed on the server, it doesn't materially matter if it's one or the other when it comes to performance, since either way you're sending a request to the server and retrieving the result. (Where it may matter is in your need to define a save point if you're trying it from within a transaction, for the same reason. Or, as highlighted in Craig's answer, if you've many failed statements.)

In future releases, a proper merge and upsert are on the radar, but as the near-decade long discussion will suggest implementing them properly is rather thorny:

With respect to the other options you mentioned, the above wiki pages and the links within them should highlight the difficulties. Basically though, using a subselect is cheap, as noted by Erwin, but isn't concurrency-proof (unless you lock properly); using locks basically amounts to locking the entire table (trivial but not great) or reinventing the wheel that's being forged in core (trivial for existing rows, less so for potentially new ones which are inserted concurrently if seek to use predicates instead of a table-level lock); and using a transaction and catching the exception is what you'll end up doing anyway.

Denis de Bernardy
  • 75,850
  • 13
  • 131
  • 154
  • `MERGE` is absolutely *not* the solution for upsert. It's commonly and incorrectly abused for that on other DBMSes, but it's actually unsafe to do that, as `MERGE` doesn't offer an atomic upsert. It's no better than `not exists`. – Craig Ringer Dec 23 '14 at 23:14
  • Also, it is *not possible* to use row level locking to implement an upsert correctly, because you would have to be able to acquire a predicate lock on the key, which isn't supported. – Craig Ringer Dec 23 '14 at 23:20
  • @CraigRinger: It's possible using a table-level lock, or (equivalently) a predicate lock on e.g. the table's name. – Denis de Bernardy Dec 23 '14 at 23:24
  • Sure - but that's not really an interesting solution to the upsert problem and it's also trivial. You refer to *"using a subselect is not concurrency proof (unless you lock properly, which is non-trivial)*; my point is that this can never work. – Craig Ringer Dec 24 '14 at 00:22
0

Work is ongoing to add a native upsert to PostgreSQL 9.5, which will probably take the form of an INSERT ... ON CONFLICT UPDATE ... statement.

In the mean time, you must attempt the update and if it fails, retry. There's no safe alternative, though you can loop within a PL/PgSQL function to hide this from the application.

Re trying and letting it fail:

Are there any drawbacks to this approach?

It creates a large volume of annoying noise in the log files. It also burns through transaction IDs very rapidly if the conflict rate is high, potentially requiring more frequent VACUUM FREEZE to be run by autovacuum, which can be an issue on large databases.

Is there any performance penalty with the failure?

If the conflict rate is high, you'll be doing a bunch of extra round trips to the database. Otherwise not much really.

Community
  • 1
  • 1
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778