-2

I have three tables sharing common unique IDs and some other fields, let's say comments, posts, and tags:

comments:

id | user_id | country_id | zone_id | created_at

posts:

id | user_id | country_id | zone_id | created_at

tags:

id | user_id | country_id | zone_id | created_at

What I want to do now is I want to get the count of rows of comments, posts, and tags, grouped by day based on created_at column, and grouped by pairs of user_id, country_id and zone_id, something like:

date | user_id | country_id | zone_id | count(comments.id) | count(posts.id) | count(tags.id)

The problem is that all three tables have millions of rows, so I want to use join without duplications as much as possible. I've come up with this:

select date(c.datetime), c.user_id, c.country_id, c.zone_id, count(distinct(c.id)), count(distinct(p.id)), count(distinct(t.id))
from comments c
inner join posts p
inner join tags t
group by date(c.datetime), c.user_id, c.country_id, c.zone_id;

Surprisingly this gives correct results, but it gives a lot of duplicated rows because of joins - it's bad also because maybe in the future I will want to use SUM instead and I cannot use DISTINCT anymore.

How can I join these three tables by those 3 foreign keys (user_id, country_id, zone_id), so that I will get only distinct rows?

khernik
  • 2,059
  • 2
  • 26
  • 51
  • This seems likely to be a common error where people want some joins or unions, each possibly involving a different key, of some subqueries, each possibly involving join and/or aggregation, but they erroneously try to do all the joining/unioning then all the aggregating or to aggregate over previous aggregations. PS [Is there any rule of thumb to construct SQL query from a human-readable description?](https://stackoverflow.com/a/33952141/3404097) – philipxy Jan 16 '20 at 02:43
  • This is not clear. Use enough words, sentences & references to parts of examples to clearly & fully say what you mean. When giving a business relation(ship)/association or table (base or query result), say what a row in it states about the business situation in terms of its column values. Please in code questions give a [mre]--cut & paste & runnable code; example input (as initialization code) with desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. For SQL that includes DBMS & DDL & tabular initialization. – philipxy Jan 16 '20 at 02:49

1 Answers1

1

I think union all will give accurate counts:

select dte, user_id, country_id,
       sum(is_comment), sum(is_post), sum(is_tag)
from ((select date(created_at) as dte, user_id, country_id, 1 as is_comment, 0 as is_post, 0 as is_tag
       from comments
      ) union all
      (select date(created_at) as dte, user_id, country_id, 1 as is_comment, 0 as is_post, 0 as is_tag
       from posts
      ) union all
      (select date(created_at) as dte, user_id, country_id, 1 as is_comment, 0 as is_post, 0 as is_tag
       from tags
      )
     ) cpt
group by dte, user_id, country_id
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786