2

I have a table ContentAddressedFiles where the combination of the columns hash, size, and extension are UNIQUE. I want to create a stored procedure that when called, will insert a new record into the table with the given values. If a record for these values already exists, I want to just return that existing record. Here's my approach:

CREATE OR REPLACE FUNCTION INIT_CAF( id_in_case_of_new UUID, _hash VARCHAR(255), _size INTEGER, _extension VARCHAR(255), _mimeType VARCHAR(255))
RETURNS "ContentAddressedFiles"
AS $$
DECLARE
  caf "ContentAddressedFiles"%ROWTYPE;
BEGIN

  INSERT INTO "ContentAddressedFiles" (id, hash, size, extension, "mimeType", "createdAt", "updatedAt")
  VALUES( id_in_case_of_new, _hash, _size, _extension, _mimeType, NOW(), NOW() ) RETURNING * INTO caf;

  RETURN caf;

EXCEPTION WHEN unique_violation THEN

  SELECT * FROM "ContentAddressedFiles" INTO caf WHERE "hash" = _hash AND "size" = _size AND "extension" = _extension;

  IF NOT FOUND THEN
    RAISE EXCEPTION 'This should never happen.';
  END IF;

  RETURN caf;

END;
$$ LANGUAGE plpgsql;

However, when I call the procedure from concurrent transactions, I consistently get the exception:

EXCEPTION: This should never happen.

How is this even possible? The procedure does not seem to be able to SELECT the reason for the failing INSERT before (it's not the id that clashes, it's just the tuple of <hash, size, extension>.

DeX3
  • 5,200
  • 6
  • 44
  • 68
  • so you see raised exveption and null instead of expected row?.. please add output you have to the question – Vao Tsun Feb 09 '17 at 08:17
  • Yes, exactly I get the exception when I run the procedure concurrently from different transactions. – DeX3 Feb 09 '17 at 08:19
  • http://stackoverflow.com/questions/6722344/select-or-insert-a-row-in-one-command –  Feb 09 '17 at 08:20
  • Possible duplicate of [SELECT or INSERT a row in one command](http://stackoverflow.com/questions/6722344/select-or-insert-a-row-in-one-command) – Laurenz Albe Feb 09 '17 at 08:28
  • 1
    I think he asks how it happens - not how to do this task easy way. I totally agree with the efficiency of link given by a_horse_with_no_name 10 mins ago, but what is the flaw with @DeX3 logic? – Vao Tsun Feb 09 '17 at 08:31
  • `id_in_case_of_new` could cause a `unique_violation` too. And I think you might misunderstood how that could work: if you supply the `id` column in an `INSERT` it will use it's value during insert (even if it's `NULL`). – pozs Feb 09 '17 at 08:44
  • no, I specifically need to generate the id on the application-side, but I can guarantee that in case of my tests, the id is not the column causing the clash (it's an UUIDv4 by the way). – DeX3 Feb 09 '17 at 08:50
  • 1
    Well, with [`COLUMN_NAME` / `CONSTRAINT_NAME` of `GET STACKED DIAGNOSTICS`](https://www.postgresql.org/docs/current/static/plpgsql-control-structures.html#PLPGSQL-EXCEPTION-DIAGNOSTICS), you can make sure. – pozs Feb 09 '17 at 08:56
  • I too wondered what is the original message? `EXCEPTION WHEN unique_violation THEN raise info '%',SQLERRM;` – Vao Tsun Feb 09 '17 at 09:05

1 Answers1

1

The question how to avoid the problem is answered with the comments; I'll explain here why PostgreSQL behaves in the observed way.

The reason is that the INSERT and the SELECT statement in the function see different snapshots (states) of the database, since the transaction runs with the default isolation level of READ COMMITTED. At that isolation level every statement gets a new database snapshot.

The explanation for the observed behaviour must be that a concurrent transaction deletes or modifies the row between the failed INSERT and the following SELECT statement, so that the row that caused the constraint violation for the INSERT is no longer there when the SELECT is run.

There are two approaches to deal with the problem:

  • Choose a higher isolation level: Then both statements will see the same snapshot of the database, and the row that prevented the INSERT will be found by the SELECT, even though it has been changed in the meantime. That is no problem, it just means that the whole transaction logically takes place at the time the snapshot was taken.

  • Run both statements as a single statement with a CTE, like the solution given in the comments recommends. Then they will also see the same database snapshot.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • Yep, you are right, this is the most likely scenario (even if the window for race condition is tiny). At least for `VOLATILE` functions; from `STABLE`, snapshotting is different, but that category wouldn't allow `INSERT`s anyway. Maybe a great example of this snapshotting is [how the original UPSERT example](https://www.postgresql.org/docs/current/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING) works. – pozs Feb 09 '17 at 15:49