0

I have a table declared like so:

CREATE TABLE my_table (
    ...
    guid uuid UNIQUE DEFAULT create_guid('my_table')
    ....
);

I have a function to compute unique GUIDs.

CREATE FUNCTION create_guid(in_table text) 
RETURNS uuid AS $$
DECLARE 
    v_guid uuid;
    v_rows int;
BEGIN
    v_guid := md5(current_timestamp::text||random()::text);
    EXECUTE 'SELECT 1 FROM ' || quote_ident(in_table) ||'  WHERE guid=' || quote_literal(v_guid);
    GET DIAGNOSTICS v_rows = ROW_COUNT;
    WHILE v_rows > 0 LOOP -- can't use FOUND with EXECUTE
        v_guid := md5(current_timestamp::text||random()::text||v_guid::text);
        EXECUTE 'SELECT 1 FROM ' || quote_ident(in_table) ||'  WHERE guid=' || quote_literal(v_guid);
        GET DIAGNOSTICS v_rows = ROW_COUNT;
    END LOOP;
    RETURN v_guid;
END;
$$ LANGUAGE PLPGSQL VOLATILE;

I have never experienced an INSERT failure in my production environment, but in my test environment I can fairly reliably obtain an error similar to the following:

ERROR: duplicate key value violates unique constraint 'my_table_guid_key'
DETAIL:  Key (guid)=(fed050ad-61c4-d548-3008-2de01301c2fc) already exists.

I realize current_timestamp could be an identical value and I suppose random() could as well, but it should be rather unlikely and its not. Even if they they were the same, wouldn't the while loop prevent duplication?

I am, of course, using the default value when inserting. How can this happen? What would fix it?

Dwayne Towell
  • 8,154
  • 4
  • 36
  • 49
  • `current_timestamp` is defined to return the same results (within a function/transaction), at least you can use `statement_timestamp()` or preferably `clock_timestamp()` http://www.postgresql.org/docs/current/static/functions-datetime.html#FUNCTIONS-DATETIME-CURRENT -- `GET DIAGNOSTICS` is a little bit ugly, you can use `EXECUTE 'SELECT COUNT(...) ...' INTO ...` – pozs Nov 11 '14 at 08:33

1 Answers1

3

I have a function to compute unique GUIDs

There's your problem, you've reinvented a UUID algorithm generation variant, and it's a buggy one. Instead, just:

CREATE EXTENSION "uuid-ossp";

SELECT uuid_generate_v4();

... which produces output like:

           uuid_generate_v4           
--------------------------------------
 d3e3a21c-877d-4731-a119-09b25015cb7a
(1 row)

A v4 UUID isn't going to collide, even with the birthday paradox in play. Really. Including timestamps actually increases the chance of a collision rather than decreasing it.

While you're at it, store UUIDs with the uuid data type, it's faster and more compact. Or at least store them in bytea form as big-endian values. Anything but a padded, formatted text representation that'll be slow and wasteful.


Even if they they were the same, wouldn't the while loop prevent duplication?

No, because it's a variant of the upsert / create-if-not-exists problem where two concurrent transactions can each insert the same value, but neither can see the other's value.


I cleaned up the function to make it a little more readable, and don't see anything obviously wrong with the logic other than the very weak uuid generation scheme and that it won't work in the face of concurrency.

CREATE FUNCTION create_guid(in_table text) 
RETURNS uuid AS $$
DECLARE 
    v_guid uuid;
    v_matched boolean = 1;
BEGIN
    WHILE v_matched
    LOOP
      v_guid := md5(current_timestamp::text||random()::text);
      EXECUTE format('SELECT 1 FROM %I  WHERE guid = %L', in_table, v_guid)
      INTO v_matched;
    END LOOP;
    RETURN v_guid;
END;
$$ LANGUAGE PLPGSQL VOLATILE;

It's declared volatile, it's using two volatile functions, and it's doing a recheck loop. If you hold an exclusive lock on the target table then it looks sane. (If you don't hold an exclusive lock it can still fail with a duplicate key error if there are other concurrent inserters).

It seems likely that the issue is in the test harness or other components outside what's visible here.

Community
  • 1
  • 1
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • I know all transactions are sequential in my test environment so I don't understand how the upsert problem can exist. – Dwayne Towell Nov 11 '14 at 06:50
  • I am storing them as uuid's. Maybe you didn't read the question? – Dwayne Towell Nov 11 '14 at 06:51
  • I know about uid_generate_v4() et al. That is not my question. – Dwayne Towell Nov 11 '14 at 06:51
  • @DwayneTowell No, sadly generating a uuid value does not prevent other transactions to generate the same uuid value -- while inserting that value could prevent other transactions to insert the same uuid. You cannot do that in a uuid generator function, all you can do is to lock the whole table, or use advisory locks http://www.postgresql.org/docs/current/static/explicit-locking.html#ADVISORY-LOCKS (but the already provided uuid generator functions should be enough) – pozs Nov 11 '14 at 08:53
  • @DwayneTowell I indeed failed to notice that you're using `uuid`. As for the rest, if you don't mention it in the question there's no way to know. The point remains that trying to generate your own UUIDs is almost certainly *wrong* and that you're taking a particularly weak UUID generation approach, certain to be worse than all the standard approaches. I don't see an obvious reason why the function would return repeat UUIDs, but without some way to see the test env and the difference between it and production it's hard to say more about what could be going on. – Craig Ringer Nov 11 '14 at 09:41