3

I have two tables - users(id, name) and user_comments(user_id, comment_content). The problem is to get limited amount(actually 10) of users (actually names), ordered by the comments count. The user_comments table can contain some user_ids(invalid ids) that the users table doesn't contain, these rows must be ignored.

At first I wrote this simple join query:

SELECT users.name, COUNT(*) AS comment_count
FROM user_comments
JOIN users ON users.id = user_comments.user_id
GROUP BY id
ORDER BY comment_count DESC
LIMIT 0, 10

This query works fine - 0.2 seconds for 18.000 users and 21.000 comments.

Then I decided to optimize the query (not the structure, I can't change/add indexes), and I wrote this query:

SELECT users.name, top_active_users.comment_count
FROM  ( SELECT user_id, COUNT(user_id) AS comment_count
        FROM user_comments
        GROUP BY user_id
        ORDER BY comment_count DESC ) AS top_active_users
JOIN users ON users.id = top_active_users.user_id
LIMIT 0, 10

This one works faster(about 5 times) - 0.04 seconds for 18.000 users and 21.000 comments. I think the second one works faster because it doesn't join every row from top_active_users subquery with users table's rows. Because of LIMIT 0, 10 it joins only 10 rows from the subquery result that have valid user_ids. The first query have to join all the rows, then order by and then take only 10 rows.

Now the question, do I think right, and if so, how can I debug the queries to see the advantage of the second query.

The mysql profiler(EXPLAIN, DESCRIBE) doesn't work with LIMIT I guess.

Thank you.

Domenic Fiore
  • 534
  • 5
  • 11
Hovo
  • 790
  • 4
  • 21

1 Answers1

2

You are basically correct in your assumption about why this is faster.

In your second, faster, query, you are processing only one column from one table. It is probably indexed, so the GROUP BY operation doesn't require any presorting. Then you use that result set to pull information from your users table. And, because of the LIMIT, you only need to pull ten rows' worth of data.

AFAIK, EXPLAIN works correctly with LIMIT. A LIMITed query is an important optimization case.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • I've found a stackoverflow question - http://stackoverflow.com/questions/6775498/mysql-command-explain-ignore-limit and this article - http://www.percona.com/blog/2006/07/24/mysql-explain-limits-and-errors/ concerning EXPLAIN and LIMIT it says - LIMIT is not taken into account while estimating number of rows Even if you have LIMIT which restricts how many rows will be examined MySQL will still print full number. – Hovo Oct 01 '14 at 11:08