Extreme query optimization with IN clause and subquery

Question

My table has more than 15 millions of rows just now. I need to run such query:

SELECT ch1.* FROM citizens_dynamic ch1
WHERE ch1.id IN (4369943, ..., 4383420, 4383700)
AND ch1.update_id_to = (
    SELECT MAX(ch2.update_id_to)
    FROM citizens_dynamic ch2
    WHERE ch1.id = ch2.id AND ch2.update_id_to < 812
)

Basically, for every citizen in IN clause it searches for a row with closest but lower than specified update_id_to.

There is PRIMARY key on 2 columns columns update_id_to, id. At the moment, this query is executed in 0.9s (having 100 ids in IN clause). It's still too slow, I would need to run my scripts for 3 days to complete.

Below you can see my EXPLAIN output. phpmyadmin explain output

id index is just like PRIMARY key, but with reversed columns: id, update_id_to

Do you have any ideas how to make it even faster?

score 1 · Accepted Answer · edited May 23 '17 at 11:58

I've found that MySQL tends to perform better with JOIN than correlated subqueries.

SELECT ch1.*
FROM citizens_dynamic AS ch1
JOIN (SELECT id, MAX(update_id_to) AS update_id_to
      FROM citizens_dynamic
      WHERE id IN (4369943, ..., 4383420, 4383700)
      GROUP BY id) AS ch2
    ON ch1.id = ch2.id
WHERE ch1.id IN (4369943, ..., 4383420, 4383700)

Also, see the other methods in this question:

Retrieving the last record in each group

Extreme query optimization with IN clause and subquery

1 Answers1