2

I have the following query for a queue system. The slowest query that troubles me is this one:

UPDATE workers SET work = $workid, last_used = NOW() 
WHERE status = 1 AND work IS NULL ORDER BY last_used ASC LIMIT 1

When there's no load, the query executes within about 0.04 seconds but when many php scripts are executing this query the execution time goes higher and higher up to 40.0 seconds which is a big problem.

The table has around 40.000 entries and there is an index for status and for liking_media. The EXPLAIN for the query shows the parser is using an intersect with status and liking_media and gets about 3000 rows to process with ORDER_BY. EXPLAIN shows further using where; using filesort.

The VPS behind it has 8 cores @ 2.5ghz, 12GB RAM. When the query runs very slow there's only low CPU usage. The CPU usage is much higher when the load begins to ramp up.

How can I greatly improve the performance of this query under load when many php scripts are running it? Can I tweak general mysql settings to fix it? Or is the table architecture bad or is an index missing? I'd like to be able to run about 300 of this queries per second without loosing performance.

Anthony
  • 36,459
  • 25
  • 97
  • 163
maddo7
  • 4,503
  • 6
  • 31
  • 51
  • Sounds like something else is locking the table. Probably that other query needs optimized – Anthony Aug 31 '14 at 20:14
  • There are 5 queries sequentially executed in a while loop to get the jobs. I've logged execution time for all of them and it always showed long execution times only for the query I've posted here, the other ones stay fast. Only these 5 queries get executed very often so the query that slows down must be within those 5. Since the query I've posted was always slow I suspect it to be the problem, makes sense? – maddo7 Aug 31 '14 at 20:20
  • does the workers table have a unique id field? That would help for using subquery to get rid of the ORDER BY. – Anthony Aug 31 '14 at 20:26
  • It has a worker_id as primary but the thing with the last_used is that I always want to get the least used workers first. The query updates a worker entry and sets it as last_used = NOW() so the same worker won't get returned when the query is issued multiple times unless it is the least used one again. – maddo7 Aug 31 '14 at 20:29

2 Answers2

2

The problem with the query is finding the appropriate row(s) to update. For this query:

UPDATE workers
    SET work = $workid, last_used = NOW() 
    WHERE status = 1 AND work IS NULL
    ORDER BY last_used ASC
    LIMIT 1;

You want the following composite index: workers(status, work, last_used).

This should speed the query and prevent multiple updates from locking each other out.

Anthony
  • 36,459
  • 25
  • 97
  • 163
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Even though this query shows 10 000 rows returned instead of 3 000 with my old query it seems to perform better but it still uses filesort and when many queries are active, execution time goes up and everything slows down. – maddo7 Aug 31 '14 at 21:58
  • Even with the composite index, it is still using "filesort"? That surprises me. – Gordon Linoff Sep 01 '14 at 01:32
  • Yes, filesort is still in there. The full explain shows: id => 1; select_type => SIMPLE; table => workers; type => range; possible_keys => status; key => status; key_len => 27; ref => const,const; rows => 10568; Extra => Using where; Using filesort. Is there any more info you need? – maddo7 Sep 01 '14 at 09:07
  • It seems it only uses filesort when using an update query. If I exchange that update for a SELECT the using filesort is gone. Why could that be? – maddo7 Sep 01 '14 at 21:05
  • @Matthias . . . The documentation on `order by` optimization does only talk about `select` and not `update` (http://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html). – Gordon Linoff Sep 02 '14 at 03:17
  • So how can I improve this query? Would it be wise to use SELECT ... FOR UPDATE and then UPDATE workers where worker_id = $worker_id? – maddo7 Sep 02 '14 at 08:46
2

I think this should avoid the ORDER BY, which is forcing the query to get all results instead of just the one with newest last_used:

UPDATE workers
    INNER JOIN (
        SELECT MIN(last_used), worker_id 
            FROM workers
    ) AS newest_worker 
    ON newest_workder.worker_id = workers.worker_id
SET workers.work = $workid, workers.last_used = NOW() 
WHERE workers.status = 1 AND workers.work IS NULL
Anthony
  • 36,459
  • 25
  • 97
  • 163
  • The LIMIT 1 is missing, else all workers get updated at once. So you think the problem is in the ORDER BY? – maddo7 Aug 31 '14 at 20:59
  • MAX only returns 1 row, so no LIMIT necessary. – Anthony Aug 31 '14 at 21:01
  • Problem is with order by + limit returns all matching rows first then reduces to 1, vs MAX getting just one matching row. – Anthony Aug 31 '14 at 21:03
  • To get the least used one wouldn't it actually be MIN()? And another thing: The query performs great but now I'm getting deadlocks for this query: Deadlock found when trying to get lock; try restarting transaction. Is there any way to fix it? – maddo7 Aug 31 '14 at 21:12
  • For oldest use MIN(), yes. Updated. – Anthony Aug 31 '14 at 21:17
  • Any comment on the deadlock? This whole thing is running as a cron and if it gets interrupted early, the speed boost doesn't bring improvement – maddo7 Aug 31 '14 at 21:18
  • http://stackoverflow.com/questions/2596005/working-around-mysql-error-deadlock-found-when-trying-to-get-lock-try-restarti – Anthony Aug 31 '14 at 21:21
  • Okay, I tried to handle deadlocks in the code and to simply re-do the query if a deadlock occurs but as more queries keep going, more deadlocks occure and where I had 40 seconds execution time with the old query, I now have 100 deadlocks in a row before a query actually gets executed which is even slower than it was before. Is there really no way to avoid those deadlocks? – maddo7 Aug 31 '14 at 21:55
  • From what i can tell, the deadlocks are because the sub query is on the same table as the UPDATE query, so unless there is third option, you're left picking your poison. Maybe the new index proposed in other answer? – Anthony Aug 31 '14 at 22:09
  • I've already tried that index but it didn't bring me better results and once the deadlocks start to occur almost no query gets executed anymore. This solution would be perfectly fast if I could get rid of the deadlocks – maddo7 Aug 31 '14 at 22:16
  • If you move the where clause to the outer query is it faster or slower? `WHERE workers.status = 1 AND workers.work IS NULL` – Anthony Aug 31 '14 at 22:25
  • Maybe I'm too tired right now but I don't get the changes. Could you update your initial post with a second code block containing the changes you've just mentioned? – maddo7 Aug 31 '14 at 22:47
  • It seems the new query always updates 0 rows so it does nothing. I tested the results, there were possible rows to update. – maddo7 Aug 31 '14 at 22:58