0

Given:

customer[id BIGINT AUTO_INCREMENT PRIMARY KEY, email VARCHAR(30), count INT]

I'd like to execute the following atomically: Update the customer if he already exists; otherwise, insert a new customer.

In theory this sounds like a perfect fit for SQL-MERGE but the database I am using doesn't support MERGE with AUTO_INCREMENT columns.

https://stackoverflow.com/a/1727788/14731 seems to indicate that if you execute a query or update statement against a non-existent row, the database will lock the index thereby preventing concurrent inserts.

Is this behavior guaranteed by the SQL standard? Are there any databases that do not behave this way?

UPDATE: Sorry, I should have mentioned this earlier: the solution must use READ_COMMITTED transaction isolation unless that is impossible in which case I will accept the use of SERIALIZABLE.

Community
  • 1
  • 1
Gili
  • 86,244
  • 97
  • 390
  • 689
  • So which DBMS are you using? And no, it is not guaranteed by the SQL standard that "non-existing" rows are locked (seems quite a strange concept actually) –  May 03 '13 at 18:04
  • See this related question from earlier today: http://stackoverflow.com/questions/16359900/ignoring-errors-in-concurrent-insertions – Denis de Bernardy May 03 '13 at 18:17
  • @a_horse_with_no_name, I am using [H2](http://www.h2database.com/) but I'm fishing for a solution that is portable. So you're saying there is nothing along the lines of `SELECT FOR INSERT`? Please post a formal answer so we can comment on it. – Gili May 04 '13 at 05:48
  • The H2 specific `merge` should work: `merge into customer(id, email, count) key(email) values(null, 'test@acme.com', 10)`. But this will not work in other databases. – Thomas Mueller May 05 '13 at 13:56
  • @ThomasMueller, I'm confused. According to the last comment on stackoverflow.com/a/6307884/14731 you told me that `supplying NULL when using MERGE doesn't make sense` but now you're telling me to do exactly that. If the misunderstanding revolves around the use of `key()` then perhaps you could update the first part of your answer to make use of it? – Gili May 06 '13 at 04:28
  • @Gili supplying null (unknown) for a *key* doesn't make sense. In your case, the key is `email`, and you do know the value. I'm not sure if the `merge into customer(id, email, count) key(email) values(null, 'test@acme.com', 10)` would always update the row however; it might need to be change to prevent that. So yes, it might not work just like that. – Thomas Mueller May 06 '13 at 05:01
  • @ThomasMueller, just to double check we're talking about the same thing. In your answer, did you understand that the table primary key is actually `id` even though I am merging on `email` (which is **not** a primary key)? ... and that is legal? – Gili May 06 '13 at 13:38
  • @Gili yes it is legal. You should have a unique key on email however (without index performance is bad, and you want the records to be unique as far as I understand). The `merge` statement would look something like: `merge into customer(id, email, count) key(email) values((select max(id) from customer c2 where c2.email='test@acme.com'), 'test@acme.com', 10)` - meaning re-use the id if a record exists, but use `null` if not. – Thomas Mueller May 06 '13 at 14:25
  • @ThomasMueller, excellent explanation. Can you please add this example to http://stackoverflow.com/a/6307884/14731? – Gili May 07 '13 at 04:13
  • @Gili I updated the answer. But please note the two question are slightly different: here you have a key (`email`) that isn't the autoincrement column. In the other question there is only `id`. – Thomas Mueller May 07 '13 at 07:03

4 Answers4

2

Use Russell Fox's code but use SERIALIZABLE isolation. This will take a range lock so that the non-existing row is logically locked (together with all other non-existing rows in the surrounding key range).

So it looks like this:

BEGIN TRAN
IF EXISTS (SELECT 1 FROM foo WITH (UPDLOCK, HOLDLOCK) WHERE [email] = 'thisemail')
BEGIN
    UPDATE foo...
END
ELSE
BEGIN
    INSERT INTO foo...
END
COMMIT

Most code taken from his answer, but fixed to provided mutual exclusion semantics.

usr
  • 168,620
  • 35
  • 240
  • 369
  • 1
    That's SQL Server syntax. If Gili was using that, he could also use the MERGE statement. –  May 03 '13 at 20:20
  • I overlooked that. But surely this code is translatable to his system?! The key is using serializable isolation. – usr May 03 '13 at 20:24
  • 1
    @usr, wouldn't REPEATABLE_READ sufficient? I need to prevent inserting of specific values, not a whole range of values. – Gili May 04 '13 at 06:06
  • 1
    @Gili REPEATABLE_READ does not lock non-existing rows and it also does not lock ranges of rows (alas! I wish it did). It does not prevent rows appearing later, it only keeps rows from changing. – usr May 04 '13 at 10:36
2

This question is asked about once a week on SO, and the answers are almost invariably wrong.

Here's the right one.

insert customer (email, count) 
select 'foo@example.com', 0
where not exists (
      select 1 from customer
      where email = 'foo@example.com'
)

update customer set count = count + 1
where email = 'foo@example.com'

If you like, you can insert a count of 1 and skip the update if the inserted rowcount -- however expressed in your DBMS -- returns 1.

The above syntax is absolutely standard and makes no assumption about locking mechanisms or isolation levels. If it doesn't work, your DBMS is broken.

Many people are under the mistaken impression that the select executes "first" and thus introduces a race condition. No: that select is part of the insert. The insert is atomic. There is no race.

James K. Lowden
  • 7,574
  • 1
  • 16
  • 31
  • I question: `select is part of the insert. The insert is atomic. There is no race.` Take a look at http://stackoverflow.com/a/5598275/14731 which shows that SQL unions are **not** atomic. I expect insert+select to behave the same as a union. – Gili May 06 '13 at 13:42
  • @Gili: that statement is only true for SQL Server. Oracle and Postgres **guarantee** *statement level* read consistency. So the whole statement (including a UNION) sees a "snapshot" of the database that reflects the state when the statement *started*. While the statement is running it will never see any changes (not even committed ones) in those DBMS (or any other DBMS that uses MVCC I would assume) –  May 06 '13 at 15:44
  • @Gill, you may find reading the documentation a better use of your time. I'm not guessing and I'm not expecting. I'm explaining, because I know. And, btw UNION is atomic on SQL Server too. It's part of the standard. The SO answer you refer to *is wrong* and is based on a misunderstanding of the server's behavior. – James K. Lowden May 13 '13 at 04:21
  • 2
    -1 You are confusing atomicity and isolation. Without a unique constraint on `customer(email)` you **can** end up with duplicates [as this user found out](http://stackoverflow.com/questions/3407857/only-inserting-a-row-if-its-not-already-there) – Martin Smith Sep 15 '13 at 20:02
  • @MartinSmith, so then what is the "correct" way to insert or update? Is there a way to do this which doesn't rely on DB-specific features? – Gili Sep 15 '13 at 20:26
  • 1
    @Gili - If you have a unique constraint on the column then attempting the insert, checking for error and doing the update on key violation should work in all. If you don't have a unique constraint then you could use the `INSERT ... SELECT` code here at `SERIALIZABLE` isolation level. – Martin Smith Sep 15 '13 at 20:33
  • 1
    @a_horse_with_no_name -SQL Server has read committed snapshot as well that operates in the same way as far as that `UNION` is concerned. Of course the code in this answer would likely still insert duplicates under load at RCSI. Two transactions could both start at the same time. Read a snapshot where the row does not exist and proceed to insert one. – Martin Smith Sep 15 '13 at 20:38
  • @MartinSmith, please post a formal answer (with some pseudo code) because I have follow-up questions and this answer already has too many comments. – Gili Sep 15 '13 at 20:39
  • @Gili - Sorry got other things to do. – Martin Smith Sep 15 '13 at 20:47
  • @MartinSmith, if I use a unique constraint and I attempt to `SELECT` after a conflict, who is to say that someone won't remove the row before I `SELECT` it? Are you implying that I should loop forever with `INSERT ... SELECT` until I get a consistent result? – Gili Sep 15 '13 at 21:01
  • And just to labour this point a bit further see the Index Intersection example [here](http://blogs.msdn.com/b/craigfr/archive/2007/05/02/query-plans-and-read-committed-isolation-level.aspx) which shows that a `select * from t` at read committed can (with the right execution plan and concurrent activity) return inconsistent column values within the same row. – Martin Smith Sep 16 '13 at 13:21
1

Answering my own question since there seems to be a lot of confusion around the topic. It seems that:

-- BAD! DO NOT DO THIS! --
insert customer (email, count) 
select 'foo@example.com', 0
where not exists (
      select 1 from customer
      where email = 'foo@example.com'
)

is open to race-conditions (see Only inserting a row if it's not already there). From what I've been able to gather, the only portable solution to this problem:

  1. Pick a key to merge against. This could be the primary key, or another unique key, but it must have a unique constraint.
  2. Try to insert a new row. You must catch the error that will occur if the row already exists.
  3. The hard part is over. At this point, the row is guaranteed to exist and you are protected from race-conditions by the fact that you are holding a write-lock on it (due to the insert from the previous step).
  4. Go ahead and update if needed or select its primary key.
Community
  • 1
  • 1
Gili
  • 86,244
  • 97
  • 390
  • 689
0
IF EXISTS (SELECT 1 FROM foo WHERE [email] = 'thisemail')
BEGIN
    UPDATE foo...
END
ELSE
BEGIN
    INSERT INTO foo...
END
Russell Fox
  • 5,273
  • 1
  • 24
  • 28