I have a simple procedure where I need to process records of a table, and ideally run multiple instances of the process without processing the same record. The way I've done this with MySQL is fairly common (although I perceive the token field to be more of a hack):
Adding a couple of fields to the table:
CREATE TABLE records (
id INTEGER PRIMARY KEY AUTO_INCREMENT,
...actual fields...
processed_at DATETIME DEFAULT NULL,
process_token TEXT DEFAULT NULL
);
And then a simple processing script:
process_salt = md5(rand()) # or something like a process id
def get_record():
token = md5(microtime + process_salt)
db.exec("UPDATE records SET process_token = ?
WHERE processed_at IS NULL LIMIT 1", token)
return db.exec("SELECT * FROM records WHERE token = ?", token)
while (row = get_record()) is valid:
# ...do processing on row...
db.exec("UPDATE records SET processed_at = NOW(), token = NULL
WHERE id = ?", row.id)
I'm implementing such a process in a system which uses a PostgreSQL database. I know Pg could be considered more mature than MySQL with regards to locking thanks to MVCC - can I use row-locking or some other feature in Pg instead of the token field?