2

I try to make a list of users having the most articles published (or something like that).

So I wrote something like this:

SELECT u.id, COUNT(a.id) AS articles
FROM users u
LEFT JOIN articles a ON a.user_id = u.id
GROUP BY u.id
ORDER BY articles DESC
LIMIT 10

The 'explain' command shows it 'Using join buffer (Block Nested Loop)' and 'Using temporary; Using filesort'.

It costs really long time to get the result. I know the reason that causes this problem. But I don't know how to solve it.

Also, I wanted to sort records by user's register time instead of published articles as an option.

And I know I can add a column to users table accumulates the articles number, but I'm not be able to change the original program.

It seems to me that this is the only way to make the list. But also the impossible way to retrieve the result before my time.

Is there another faster way to get a top 10 list based on join queries?

Any suggestion would be appreciated.

chouyangv3
  • 467
  • 5
  • 12
  • I think problem with ORDER BY, check this https://stackoverflow.com/questions/14368211/mysql-very-slow-order-by – Casper Nov 30 '17 at 08:59
  • 1
    Your left join slows your query down - while you actually only need it for users without articles, which are either not part of your top-10 list anyway (or you have only a handful of users in your articles table, being fast on its own). So a solution could be to not join: `select a.user_id, count(a.id) from articles a group by a.user_id order by count(a.id)`. (And if you get less than 10 results, run a 2nd query to add some 0-article users). The next step would be, as you guessed, precalculated statistics. (You can do it via triggers, you don't need to change the whole app, just the query). – Solarflare Nov 30 '17 at 09:10
  • 1
    Also, just to be sure: you *do* have an index on `articles(user_id, id)`, right? (You should always include indexes and the whole explain output to your question) – Solarflare Nov 30 '17 at 09:16
  • @solarflare Would there be any point in having a composite index which included a column whose cardinality was one? – Strawberry Nov 30 '17 at 09:24
  • @Strawberry: Yes, it may help the optimizer to understand that there is no need to read the table in order to evaluate `COUNT(a.id)`. We could of course replace `COUNT(a.id)` with `COUNT(a.user_id)` to achieve the same, but this would decrease readablity, as it's actually articles we count per user. (It may still be that the optimizer handles `COUNT(a.user_id)` better, so it may be worth consideration.) – Thorsten Kettner Nov 30 '17 at 09:42
  • @Strawberry Since for InnoDB, it is included anyway (which I know you are aware of), in this case it is just convenience/personal preference/... to include the pk in the definition or not. – Solarflare Nov 30 '17 at 10:52
  • @CasperSL No, I removed it and results the same. But it has been solved. Thanks – chouyangv3 Nov 30 '17 at 15:44
  • @Solarflare I wanted to sort by user's register time (maybe something else) as an option so it has to join the articles table. Triggers could work but the answer below is better. Thanks. – chouyangv3 Nov 30 '17 at 16:06

1 Answers1

0

Try this:

SELECT
    u.id, t.articles
FROM USERS u
LEFT JOIN
    (
    SELECT user_id, COUNT(id) AS `articles`
    FROM articles
    GROUP BY user_id
    ) t ON u.id = t.user_id
ORDER BY t.articles DESC
LIMIT 10
tatskie
  • 405
  • 1
  • 7
  • 16
  • I get it! MySQL creates a temporary table from articles first, so the joined table is much smaller than the original one. It must be! – chouyangv3 Nov 30 '17 at 15:38