1

I have 1 table of users, and 10 tables (articles, news, ...) where I save user's publications. I want to show how many publications has each user, in one query:

| ID_USER | COUNT(id_article) | COUNT(id_news) | etc...
-------------------------------------------------
| 1       | 0                 | 3               |
| 2       | 2                 | 9               |
| 3       | 14                | 5               |
| 4       | 0                 | 0               |

If I use this query to show the number of articles...

SELECT id_user,COUNT(articles.id_article) FROM users 
LEFT JOIN articles ON articles.id_user_article=users.id_user 
GROUP BY users.id_user

... it shows the information correctly. But if I start to add the second table...

    SELECT id_user,COUNT(articles.id_article),COUNT(news.id_news) FROM users 
    LEFT JOIN articles ON articles.id_user_article=users.id_user
    LEFT JOIN news ON news.id_user_news=users.id_user 
    GROUP BY users.id_user

... it doesn't show the correct information.. and if I join all the rest tables, if shows really strange result (thousands of articles for first user, and NULL for the rest).

Which is the correct way of show this information using only one query? Thank you!

Albatrosz
  • 309
  • 1
  • 3
  • 9

2 Answers2

1

You can use a subselect instead of a left join for each table. The final result will be the same but maybe in that way is clearer.

SELECT u.id_user, 
  (SELECT COUNT(a.id_article) 
  FROM articles a 
  WHERE a.id_user_article = u.id_user) AS articles,
  (SELECT COUNT(n.news) 
  FROM news n  
  WHERE n.id_user_news = u.id_user) AS news
FROM users u

Also if you only uses one column of each table, the subselect is a better option than multiple left joins.

Rahul
  • 76,197
  • 13
  • 71
  • 125
Rafa Paez
  • 4,820
  • 18
  • 35
  • I will prefer joins over correlated subquery. – Rahul Feb 22 '14 at 12:55
  • @Rahul: Do you know if left joins has better performance than subqueries in that case? Also it is not exactly a correlated subquery, isn't? – Rafa Paez Feb 22 '14 at 12:59
  • 1
    With the appropriate indexes (on `id_user_article` for instance), the performance of this query might be better than the performance of the aggregation query -- in MySQL (other databases are better at doing aggregations). If the OP cares about performance, s/he should test both methods. – Gordon Linoff Feb 22 '14 at 13:10
  • I suppose the OP has indexes on these foreign keys, if not it should have, IMHO. http://stackoverflow.com/questions/836167/does-a-foreign-key-automatically-create-an-index – Rafa Paez Feb 22 '14 at 13:13
1

Your problem is that you are joining along different dimensions, which creates cartesian products for each user. The solution by @rafa is actually a fine solution in MySQL. The use of count(distinct) works okay, but only when the counts are not very large. Another approach is to pre-aggregate the results along each dimension:

SELECT u.id_user, a.articles, n.news
FROM users u left outer join
     (select id_user_article, count(*) as articles
      from articles
      group by id_user_article
     ) a
     on u.id_user = a.id_user_article left outer join
     (select id_user_news, count(*) as news
      from news
      group by id_user_news
     ) n
     on u.id_user = n.id_user_news;

EDIT:

If you are using the count(distinct) approach, then you are generating a cross product. If every user had 3 articles and 4 news items, then the users would be multiplied by 12. Probably feasible.

If every user had 300 articles and 400 news items, then every user would be multiplied by 120,000. Probably not feasible.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I think you explained better than me why many left joins is not the best approach for just getting the count of one column. This is a very good solution. – Rafa Paez Feb 22 '14 at 13:06