2

I've been getting a handful of errors on Postgresql, that seem to be tied to this race condition.

I have a process/daemon written in Twisted Python. The easiest way to describe it is as a Web Crawler - it pulls a page, parses the links, and logs what it's seen. Because of HTTP blocking, Twisted runs multiple "concurrent" processes deferred to threads.

Here's the race condition...

When I encounter a url shortener , this logic happens:

result= """SELECT * FROM shortened_link WHERE ( url_shortened = %(url)s ) LIMIT 1;"""
if result:
   pass
else:
   result= """INSERT INTO shortened_link ( url_shortened ..."

A surprising number or psycopg2.IntegrityError's are raised, because the unique index on url_shortened gets violated.

The select/insert does actually run that close together. From what I can tell, it looks like 2 shortened links get queued next to one another.

Process A: Select, returns Null
Process B: Select, returns Null
Process A: Insert , success
Process B: Insert , integrity error

Can anyone suggest any tips/tricks to handle this ? I'd like to avoid explicit locking, because I know that'll open up a whole other set of problems.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Jonathan Vanasco
  • 15,111
  • 10
  • 48
  • 72

3 Answers3

2

Do it all in a single command:

result= """
INSERT INTO shortened_link ( url_shortened ...
SELECT %(url)s
where not exists (
    select 1
    from shortened_link
    WHERE url_shortened = %(url)s
);"""

It will only insert if that link does not exist.

Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
  • 4
    This should take care of *most* of the problem. A tiny chance for a race condition still remains, though. I wrote an answer for a similar request [here](http://stackoverflow.com/questions/15939902/is-my-function-prone-to-race-conditions/15950324#15950324) – Erwin Brandstetter May 20 '13 at 16:59
  • 1
    +1 Erwin's function. It's the only bullet-proof solution here short of an outright table or advisory lock. The chance for race conditions in this answer's select statement is by no means negligible if there are enough concurrent queries. – Denis de Bernardy May 20 '13 at 18:27
2

There's really not a solution that avoids the need to be able to handle the possibility of a unique constraint violation error. If your framework can't do it then I'd wrap the SQL in a PL/pgSQL function or procedure that can.

Given that you can handle the error you might as well not test for the existence of the unique value and just attempt the insert, letting any error be handled by the EXCEPTION clause.

David Aldridge
  • 51,479
  • 8
  • 68
  • 96
  • Yeah, I coded an exception clause for now. This may be happening elsewhere - it really seems clustered on this one use case. sigh. – Jonathan Vanasco May 20 '13 at 16:30
  • 1
    I believe I devised such a solution [here](http://stackoverflow.com/questions/15939902/is-my-function-prone-to-race-conditions/15950324#15950324). – Erwin Brandstetter May 20 '13 at 17:05
1

You either need a mutex lock of some kind, or you will have to live with the redundancies that will occur due to the race condition.

If you choose to go with the mutex lock - you don't necessarily need to use a database-level lock. You can simply lock down the Twisted process to block other threads handling a similar shortened url.

If you choose to avoid the lock, remove the unique constraint on the url_shortened field. Periodically, you can move these records to a 'clean' table that contains a single unique copy of each shortened url.

PinnyM
  • 35,165
  • 3
  • 73
  • 81