I have two tables - users(id, name) and user_comments(user_id, comment_content). The problem is to get limited amount(actually 10) of users (actually names), ordered by the comments count. The user_comments table can contain some user_ids(invalid ids) that the users table doesn't contain, these rows must be ignored.
At first I wrote this simple join query:
SELECT users.name, COUNT(*) AS comment_count
FROM user_comments
JOIN users ON users.id = user_comments.user_id
GROUP BY id
ORDER BY comment_count DESC
LIMIT 0, 10
This query works fine - 0.2 seconds for 18.000 users and 21.000 comments.
Then I decided to optimize the query (not the structure, I can't change/add indexes), and I wrote this query:
SELECT users.name, top_active_users.comment_count
FROM ( SELECT user_id, COUNT(user_id) AS comment_count
FROM user_comments
GROUP BY user_id
ORDER BY comment_count DESC ) AS top_active_users
JOIN users ON users.id = top_active_users.user_id
LIMIT 0, 10
This one works faster(about 5 times) - 0.04 seconds for 18.000 users and 21.000 comments. I think the second one works faster because it doesn't join every row from top_active_users subquery with users table's rows. Because of LIMIT 0, 10 it joins only 10 rows from the subquery result that have valid user_ids. The first query have to join all the rows, then order by and then take only 10 rows.
Now the question, do I think right, and if so, how can I debug the queries to see the advantage of the second query.
The mysql profiler(EXPLAIN, DESCRIBE) doesn't work with LIMIT I guess.
Thank you.