I have developed a basic proxy tester in Python. Proxy IPs and ports, as well as their date_of_last_test
(e.g. 31/12/2011 10:10:10) and result_of_last_test
(OK or KO) are stored in a single SQLite table. (I realize I could store a lot more details on the tests results and keep an history / stats, but this simple model suits my needs).
Here is the simplified code of the tester main loop, where I loop over the proxies and update their status:
while True:
# STEP 1: select
myCursor.execute("SELECT * from proxy ORDER BY date_of_last_test ASC;")
row = myCursor.fetchone()
# STEP 2: update
if isProxyWorking(row['ip'], row['port']): # this test can last a few seconds
updateRow(row['ip'], row['port'], 'OK')
else:
updateRow(row['ip'], row['port'], 'KO')
My code works fine when run as a single process. Now, I would like to be able to run many processes of the program, using the same SQLite database file. The problem with the current code is the lack of a locking mechanism that would prevent several processes from testing the same proxy.
What would be the cleanest way to put a lock on the row at STEP 1 / SELECT time, so that the next process doing a SELECT gets the next row ?
In other words, I'd like to avoid the following situation:
Let's say it's 10PM, and the DB contains 2 proxies:
Proxy A
tested for the last time at 8PM and proxy B
tested at 9PM.
I start two processes of the tester to update their statuses:
- 10:00 - Process 1 gets the "oldest" proxy to test it:
A
- 10:01 -
Process 2 gets the "oldest" proxy to test it: !!!
A
!!! (here I'd like Process 2 to get proxyB
becauseA
is already being tested - though not updated yet in db) - 10:10 - Testing of
A
by Process 1 is over, its status is updated in DB - 10:11 - Testing of
A
by Process 2 is over, its status is updated (!!! AGAIN !!!) in DB
There is no actual error/exception in this case, but a waste of time I want to avoid.