PostgreSQL - Duplicate Unique Key

Question

On my table I have a secondary unique key labeled md5. Before inserting, I check to see if the MD5 exists, and if not, insert it, as shown below:

-- Attempt to find this item
SELECT INTO oResults (SELECT domain_id FROM db.domains WHERE "md5"=oMD5);

IF (oResults IS NULL) THEN

    -- Attempt to find this domain
    INSERT INTO db.domains ("md5", "domain", "inserted") 
        VALUES (oMD5, oDomain, now());

    RETURN currval('db.domains_seq');

  END IF;

This works great for single threaded inserts, my problem is when I have two external applications calling my function concurrently that happen to have the same MD5. I end up with a situation where:

App 1: Sees the MD5 does not exist

App 2: Inserts this MD5 into table

App 1: Goes to now Insert MD5 into table since it thinks it doesnt exist, but gets an error because right after it seen it does not, App 2 inserted it.

Is there a more effective way of doing this?

Can I catch the error on insert and if so, then select the domain_id?

Thanks in advance!

This also seems to be covered at Insert, on duplicate update in PostgreSQL?

mu is too short · Accepted Answer · 2011-07-21T06:15:09.983

2

You could just go ahead and try to insert the MD5 and catch the error, if you get a "unique constraint violation" error then ignore it and keep going, if you get some other error then bail out. That way you push the duplicate checking right down to the database and your race condition goes away.

Something like this:

Attempt to insert the MD5 value.
- If you get a unique violation error, then ignore it and continue on.
- If you get some other error, bail out and complain.
- If you don't get an error, then continue on.
Do your SELECT INTO oResults (SELECT domain_id FROM db.domains WHERE "md5"=oMD5) to extract the domain_id.

There might be a bit of a performance hit but "correct and a little slow" is better than "fast but broken".

Eventually you might end up with more exceptions that successful inserts. Then you could try to insert in the table the references (through a foreign key) your db.domains and trap the FK violation there. If you had an FK violation, then do the old "insert and ignore unique violations" on db.domains and then retry the insert that gave you the FK violation. This is the same basic idea, it just a matter of choosing which one will probably throw the least exceptions and go with that.

edited Jul 21 '11 at 06:15

answered Jul 21 '11 at 06:03

mu is too short

426,620
70
833
800

That was my next though, just got to research how to find the error. Thanks. – Anthony Greco Jul 21 '11 at 06:09
@Anthony: I added a little update that might be helpful when your `db.domains` gets close to covering all the domains that you'll be dealing with. – mu is too short Jul 21 '11 at 06:15
1

http://stackoverflow.com/questions/1109061/insert-on-duplicate-update-postgresql also seems to cover my same issue – Anthony Greco Jul 21 '11 at 06:30
@Anthony: Looks like they came to the same conclusion: let the database deal with the concurrency. – mu is too short Jul 21 '11 at 06:40
that is actually what i was planning on doing, just wasnt sure if it was the best option. Just updated all my functions to do the LOOP and working good now =) – Anthony Greco Jul 21 '11 at 06:47
@Anthony: Know you know that you're smarter than you thought you were. Or we're all crazy together (which is just as good). – mu is too short Jul 21 '11 at 06:50

PostgreSQL - Duplicate Unique Key

1 Answers1