I'm looking for a way to select one table row explicitly for one thread. I've written a crawler, that works with about 50 parallel processes. Every process has to take one row out of a table and process it.
CREATE TABLE `crawler_queue` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`url` text NOT NULL,
`class_id` tinyint(3) unsigned NOT NULL,
`server_id` tinyint(3) unsigned NOT NULL,
`proc_id` mediumint(8) unsigned NOT NULL,
`prio` tinyint(3) unsigned NOT NULL,
`inserted` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `proc_id` (`proc_id`),
KEY `app_id` (`app_id`),
KEY `crawler` (`class_id`,`prio`,`proc_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
Now my processes do the following:
- start DB transaction
- do a select like
SELECT * FROM crawler_queue WHERE class_id=2 AND prio=20 AND proc_id=0 ORDER BY id LIMIT 1 FOR UPDATE
- then update this row with
UPDATE crawler_queue SET server_id=1,proc_id=1376 WHERE id=23892
- commit transaction
This should help that no other process can grab a row that is processed yet. Doing an EXPLAIN on the select shows
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE crawler_queue ref proc_id,crawler proc_id 3 const 617609 Using where
But the processes seem to cause too high parallelism, because sometimes I can see two types of errors/warnings in my log (every 5 minutes or so):
mysqli::query(): (HY000/1205): Lock wait timeout exceeded; try restarting transaction (in /var/www/db.php l
ine 81)
mysqli::query(): (40001/1213): Deadlock found when trying to get lock; try restarting transaction (in /var/www/db.php line 81)
My question is: can anybody point me in the right direction to minimize these locking problems? (in production state, the parallelism would be 3-4 times higher than now, so I assume, that there would be much more locking problems)
I modified SELECT
to use index crawler
by hint USE INDEX(crawler)
. My problem now are lockwait timeouts anymore (deadlocks disappeared).
EXPLAIN
with USE INDEX()
shows now (no. of rows is higher, because table contains more data now):
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE crawler_queue ref proc_id,crawler crawler 5 const,const,const 5472426 Using where