I've been getting a handful of errors on Postgresql, that seem to be tied to this race condition.
I have a process/daemon written in Twisted Python. The easiest way to describe it is as a Web Crawler - it pulls a page, parses the links, and logs what it's seen. Because of HTTP blocking, Twisted runs multiple "concurrent" processes deferred to threads.
Here's the race condition...
When I encounter a url shortener , this logic happens:
result= """SELECT * FROM shortened_link WHERE ( url_shortened = %(url)s ) LIMIT 1;"""
if result:
pass
else:
result= """INSERT INTO shortened_link ( url_shortened ..."
A surprising number or psycopg2.IntegrityError's are raised, because the unique index on url_shortened gets violated.
The select/insert does actually run that close together. From what I can tell, it looks like 2 shortened links get queued next to one another.
Process A: Select, returns Null
Process B: Select, returns Null
Process A: Insert , success
Process B: Insert , integrity error
Can anyone suggest any tips/tricks to handle this ? I'd like to avoid explicit locking, because I know that'll open up a whole other set of problems.