7

I have this query that attempts to add rows to the balances table if no corresponding row exists in the totals table. The query is run in a transaction using the default isolation level on PostgreSQL.

INSERT INTO balances (account_id, currency, amount)
SELECT t.account_id, t.currency, 0
FROM balances AS b
RIGHT OUTER JOIN totals USING (account_id, currency) AS t
WHERE b.id IS NULL

I have a UNIQUE constraint on balances (accountId, currency). I'm worried that I will get into a race condition situation that will lead to duplicate key errors if multiple sessions execute this query concurrently. I've seen many questions on that subject, but they all seem to involve either subqueries, multiple queries or pgSQL functions.

Since I'm not using any of those in my query, is it free from race conditions? If it isn't how can I fix it?

LordOfThePigs
  • 11,050
  • 7
  • 45
  • 69
  • Yes, you can still get duplicate key errors. For the same reason you can get it when two sessions run the same `insert` statement with the same `values` clause. The statement sees a consistent state of the database while it is running, so it will not see any new rows even if they are committed while the statement runs. –  Apr 23 '14 at 13:10
  • A plpgsql function looping in case of a duplicate key violation can deal with the race condition on the server-side and at default isolation level, which is *safe* an typically *cheapest*. The app doesn't have to bother with retries: http://stackoverflow.com/questions/15939902/is-select-or-insert-in-a-function-prone-to-race-conditions/15950324#15950324 – Erwin Brandstetter Jan 20 '15 at 11:53

2 Answers2

3

Yes it will fail with a duplicate key value violates unique constraint error. What I do is to place the insertion code in a try/except block and when the exception is thrown I catch it and retry. That simple. Unless the application has a huge amount of users it will work flawlessly.

In your query the default isolation level is enough since it is a single insert statement and there is no risk of phantom reads.

Notice that even when setting the isolation level to serializable the try/except block is not avoidable. From the manual about serializable:

like the Repeatable Read level, applications using this level must be prepared to retry transactions due to serialization failures

Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
  • I actually prefer retrying than setting the transaction level to serializable. It shouldn't be a problem even with many users, because the set of rows that could be missing is limited, so most of the time this query is expected to be a NOP. – LordOfThePigs Apr 23 '14 at 13:30
  • @Lord It is not serializable vs retry. Check the update. – Clodoaldo Neto Apr 23 '14 at 13:46
  • If I have to retry anyway, then I'd rather not change the transaction isolation level. I wish I could accept both answers... – LordOfThePigs Apr 23 '14 at 13:57
1

The default transaction level is Read Committed. Phantom reads are possible in this level (see Table 13.1). While you are protected from seeing any weird effects in the totals table were you to update the totals, you are not protected from phantom reads in the balances table.

What this means can be explained when looking at a single query similar to yours that attempts the outer join twice (and only queries, does not insert anything). The fact that a balance is missing is not guaranteed to stay the same between the two "peeks" at the balances table. The sudden appearance of a balance that wasn't there when the same transaction looked for the first time, is called a "phantom read".

In your case, several concurrent statements can see that a balance is missing and nothing prevents them trying to insert it and error out.

To rule out phantom reads (and to fix your query), you need to execute in in the SERIALIZABLE isolation level prior to running your query:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

Jirka Hanika
  • 13,301
  • 3
  • 46
  • 75
  • That's what I thought. Thanks for confirming that. I added a part in my question: How can I fix my query? – LordOfThePigs Apr 23 '14 at 13:04
  • 1
    A single statement is not subject to phantom reads. Phantom reads can only happen if you run two statements one after the other in a single transaction. For a single insert statement, even when based on a SELECT statement, phantom reads cannot happen (the statement sees a consistent state of the database while it is running). What *can* happen is that while the insert is running other transactions are inserting values which result in a PK violation but that is not caused by a phantom read. –  Apr 23 '14 at 13:06
  • @a_horse_with_no_name - I know of no special rules for single statements that join the same table repeatedly (they can compile into arbitrary execution plans as far as I can tell from documentation). I agree that this question is not directly about a phantom read scenario. What I am saying is that the default transaction level is insufficient - it is not protecting the balances table via a table lock. – Jirka Hanika Apr 23 '14 at 13:11
  • Is there any other way than setting the transaction isolation level to serializable? This sounds a bit radical. – LordOfThePigs Apr 23 '14 at 13:22
  • @Jirka A repeated self join still happens inside a single select. – Clodoaldo Neto Apr 23 '14 at 13:31
  • @ClodoaldoNeto - Sure, that's a tautology. Now, inside the single select, the same row may be reached through multiple indexes independently. And more to the point of this post, the presence/absence of a row may be evaluated repeatedly. – Jirka Hanika Apr 23 '14 at 14:14
  • @LordOfThePigs - With the right indexes, this query can execute pretty fast and not hurt performance if you are not running it all the time. Alternatively you could lock the balances table explicitly (and stay on the default isolation level), which would avoid a table lock on the totals table. – Jirka Hanika Apr 23 '14 at 14:18
  • @LordOfThePigs - And for clarity - I'm proposing `SERIALIZABLE` only for the query mentioned in your question. Not for any other queries run against your database. – Jirka Hanika Apr 23 '14 at 14:21
  • @JirkaHanika What's the advantage from the application point of view of switching to SERIALIZABLE and having to retry the query on a duplicate key error, versus leaving it on the default level and also having to retry on a duplicate key error. Note that in the vast majority of the cases, I expect this query not to insert anything. – LordOfThePigs Apr 23 '14 at 16:10
  • 1
    @LordOfThePigs - You have no reason to retry, if non-default isolation level is only used for this query. But I was wrong in thinking that you will never get a serialization failure. You may get one so the catch block would be needed (doing nothing). I will edit my answer to incorporate this lesson, and I consider the accepted answer superior to mine. – Jirka Hanika Apr 24 '14 at 09:34