2

On my table I have a secondary unique key labeled md5. Before inserting, I check to see if the MD5 exists, and if not, insert it, as shown below:

-- Attempt to find this item
SELECT INTO oResults (SELECT domain_id FROM db.domains WHERE "md5"=oMD5);

IF (oResults IS NULL) THEN

    -- Attempt to find this domain
    INSERT INTO db.domains ("md5", "domain", "inserted") 
        VALUES (oMD5, oDomain, now());

    RETURN currval('db.domains_seq');

  END IF;

This works great for single threaded inserts, my problem is when I have two external applications calling my function concurrently that happen to have the same MD5. I end up with a situation where:

App 1: Sees the MD5 does not exist

App 2: Inserts this MD5 into table

App 1: Goes to now Insert MD5 into table since it thinks it doesnt exist, but gets an error because right after it seen it does not, App 2 inserted it.

Is there a more effective way of doing this?

Can I catch the error on insert and if so, then select the domain_id?

Thanks in advance!


This also seems to be covered at Insert, on duplicate update in PostgreSQL?

Community
  • 1
  • 1
Anthony Greco
  • 2,885
  • 4
  • 27
  • 39

1 Answers1

2

You could just go ahead and try to insert the MD5 and catch the error, if you get a "unique constraint violation" error then ignore it and keep going, if you get some other error then bail out. That way you push the duplicate checking right down to the database and your race condition goes away.

Something like this:

  • Attempt to insert the MD5 value.
    • If you get a unique violation error, then ignore it and continue on.
    • If you get some other error, bail out and complain.
    • If you don't get an error, then continue on.
  • Do your SELECT INTO oResults (SELECT domain_id FROM db.domains WHERE "md5"=oMD5) to extract the domain_id.

There might be a bit of a performance hit but "correct and a little slow" is better than "fast but broken".

Eventually you might end up with more exceptions that successful inserts. Then you could try to insert in the table the references (through a foreign key) your db.domains and trap the FK violation there. If you had an FK violation, then do the old "insert and ignore unique violations" on db.domains and then retry the insert that gave you the FK violation. This is the same basic idea, it just a matter of choosing which one will probably throw the least exceptions and go with that.

mu is too short
  • 426,620
  • 70
  • 833
  • 800