4

In my postgresql DB, I have a table "my_table" with primary key on columns (a, b).

I wrote the query below for inserts into this table which ensures that the primary key constraint is never violated.

INSERT INTO my_table(a, b, c, d, e, f)
  (SELECT 'a1',
          'b1',
          'c1',
          1442849052013,
          1,
          FALSE
   FROM
     (SELECT 'a1'::VARCHAR(100) AS a,
             'b1'::VARCHAR(50) AS b) AS new_fields
   LEFT OUTER JOIN my_table AS old_fields ON (new_fields.a = old_fields.a
                                              AND new_fields.b = old_fields.b)
   WHERE old_fields.a IS NULL);  

As multiple such queries get fired by concurrently running threads, I sometimes get

PSQLException: duplicate key value violates unique constraint "my_table_pkey". Detail: Key (a, b)=(a1, b1) already exists.

I am unable to understand that how can this query lead to the specified error condition as this query inserts into "my_table" only if its primary key columns are missing and that this condition is being checked within the same query.

If this is a valid scenario, doesn't this mean that query execution in postgreSQL is not atomic ?

Can somebody please explain...

Here, Select For Update syntax will not work as well because Select For Update takes a lock on selected row and I want to insert a row if it not already exists.

Vijay Kansal
  • 809
  • 12
  • 26
  • It is my understanding, query execution in most RDBMS is not atomic; that is why TRANSACTIONs exist. – Uueerdo Sep 21 '15 at 17:07
  • @ Uueerdo Multiple queries' execution is not atomic, but a single query's execution must be atomic, no ? – Vijay Kansal Sep 21 '15 at 17:11
  • Please format your SQL to make it easier to read. – Basil Bourque Sep 21 '15 at 17:19
  • 1
    "Atomicity" refers to the transaction either completing or doing nothing -- no in betweens allowed. This is a matter of transaction isolation. – David Aldridge Sep 22 '15 at 12:16
  • 1
    It happens because the statement ***is*** atomic. It doesn't see anything that happened _after_ it started, or anything that wasn't committed _before_ it started. The query - for the duration of its execution - sees the database in the state it was when the query started. That's the idea of being "atomic". –  Sep 22 '15 at 12:20
  • 1
    @a_horse_with_no_name I think that's more a description of the transaction isolation level -- the "I" in ACID. They're all related, of course, but "A" really means that if you're inserting 5 rows into one table and 3 into another, then they either all commit or none of them. – David Aldridge Sep 22 '15 at 12:27

1 Answers1

3

The default transaction isolation level in PostgreSQL, read committed, does not see any uncommitted data.

Thus, if another uncommitted transaction has inserted ('a1', 'b1'), the other transactions cannot see it.

However the presence of constraints will guarantee that another transaction's attempted insert of ('a1', 'b1') cannot complete until the first transaction has either rolled back (and your transaction can complete successfully), or committed (and your transaction will fail with a constraint violation).

A complete explanation is in the documentation.

David Aldridge
  • 51,479
  • 8
  • 68
  • 96
  • Thanks a lot David for explaining race conditions occurring with my query. Can you please suggest a way in Postgres to insert a row only if primary key of that row does not already exist in the table ? Currently, I am doing away with this by handling exception thrown in case Unique Constraint Violation occurs. – Vijay Kansal Sep 22 '15 at 17:00
  • 1
    @VijayKansal: See here: http://stackoverflow.com/q/26577258/330315 and here: http://stackoverflow.com/questions/1109061/insert-on-duplicate-update-in-postgresql –  Sep 22 '15 at 17:32