0

I have developed a basic proxy tester in Python. Proxy IPs and ports, as well as their date_of_last_test (e.g. 31/12/2011 10:10:10) and result_of_last_test (OK or KO) are stored in a single SQLite table. (I realize I could store a lot more details on the tests results and keep an history / stats, but this simple model suits my needs).

Here is the simplified code of the tester main loop, where I loop over the proxies and update their status:

while True:
    # STEP 1: select
    myCursor.execute("SELECT * from proxy ORDER BY date_of_last_test ASC;")
    row = myCursor.fetchone()

    # STEP 2: update
    if isProxyWorking(row['ip'], row['port']): # this test can last a few seconds
        updateRow(row['ip'], row['port'], 'OK')
    else:
        updateRow(row['ip'], row['port'], 'KO')

My code works fine when run as a single process. Now, I would like to be able to run many processes of the program, using the same SQLite database file. The problem with the current code is the lack of a locking mechanism that would prevent several processes from testing the same proxy.

What would be the cleanest way to put a lock on the row at STEP 1 / SELECT time, so that the next process doing a SELECT gets the next row ?


In other words, I'd like to avoid the following situation:

Let's say it's 10PM, and the DB contains 2 proxies: Proxy A tested for the last time at 8PM and proxy B tested at 9PM.

I start two processes of the tester to update their statuses:

  • 10:00 - Process 1 gets the "oldest" proxy to test it: A
  • 10:01 - Process 2 gets the "oldest" proxy to test it: !!! A !!! (here I'd like Process 2 to get proxy B because A is already being tested - though not updated yet in db)
  • 10:10 - Testing of A by Process 1 is over, its status is updated in DB
  • 10:11 - Testing of A by Process 2 is over, its status is updated (!!! AGAIN !!!) in DB

There is no actual error/exception in this case, but a waste of time I want to avoid.

Sébastien
  • 13,831
  • 10
  • 55
  • 70
  • I figured it would be easier for the comprehension of the code than True/False :P – Sébastien Jan 26 '12 at 09:57
  • I have similar problem and decided to select not just oldest record but oldest and which id % NUM_OF_THREADS. In this way we partition all records between worker threads so there is no collision. I know that this arises some problems like what if some worker threads has nothing to do but statistically can be OK. (Btw. KO & OK is not good because it's easy to mistake. It's not defensive programming. However defensive programming is often boring. ;)) – omikron Nov 21 '13 at 14:55

1 Answers1

1

SQlite only allows one process to update anything in the database at a time, From the FAQ

Multiple processes can have the same database open at the same time. Multiple processes can be doing a SELECT at the same time. But only one process can be making changes to the database at any moment in time,

and

When SQLite tries to access a file that is locked by another process, the default behavior is to return SQLITE_BUSY. You can adjust this behavior from C code using the sqlite3_busy_handler() or sqlite3_busy_timeout() API functions.

So if there only a few updates then this will work otherwise you need to change to a more capable database.

so there is only one lock which is on the whole database

mmmmmm
  • 32,227
  • 27
  • 88
  • 117
  • 2
    I get that you 2 processes cannot write simultaneously to the same DB file. However, this limitation should not prevent me to run multiple processes of the tester, provided I implement a retry mechanism whenever I get a SQLITE_BUSY exception. I'll edit my question to explain better what my real problem is. – Sébastien Dec 31 '11 at 14:12
  • @Sebastien - your question h=just asks how to do a row lock and not that you have tried and got an error, if so show what the error was – mmmmmm Dec 31 '11 at 14:14