-2

So I have 4 tables:

  1. post
  2. likes
  3. comments
  4. reads

Each table has the column post_id

I want to create a query, where for each post, i want to know the number of likes, the number of comments, and the number of reads
Finally, i want another column caled score, where i sum up the values in likes, comments and reads

Here is the sql query i have to do that:

SELECT posts.post_id,
       posts.title,
       count(likes.like_id) as likes,
       count(comments.comment_id) as comments,
       post_read_count.count as reads,
       (count(likes.like_id) + count(comments.comment_id) + post_read_count.count) as score
FROM community.posts
LEFT JOIN community.likes
    ON posts.post_id = likes.post_id
LEFT JOIN community.comments
    ON posts.post_id = comments.post_id
LEFT JOIN community.post_read_count
    ON posts.post_id = post_read_count.post_id
WHERE posts.is_deleted = false
GROUP BY posts.post_id, post_read_count.count
ORDER BY posts.post_id DESC
LIMIT 100

but i am unable to get the correct number of comments and the sum of items into score is not working properly

I am on postgres

heres a sample data:

posts:
post_id | title
101       some title
102       hello there
103       good day sir
104       good bye         
105       whats up 


likes:
like_id | post_id
1         101  
2         101
3         101
4         102
5         102
6         104
7         104
8         105
9         105
10        101

comments:
comment_id | post_id
1            103  
2            103
3            103
4            101
5            102
6            104
7            105
8            105
9            105
10           103

post_read_count:
post_id | count
101       12
102       54
103       76
104       23         
105       87 

desired output:

output:
post_id | title         | likes | comments | reads | score
101       some title      4       1          12      17
102       hello there     2       1          54      57
103       good day sir    0       4          76      80
104       good bye        2       1          23      26
105       whats up        2       3          87      92
Sharhad
  • 63
  • 8
  • 1
    Minimally reproducible sample data and desired results are highly encouraged for getting a response. Providing the table creates with the data is preferred. You may have a one-to-many relationship, which is why your counts are not correctly working. But without knowing anything about your data, this is impossible to answer. – Isolated Sep 08 '22 at 16:09
  • I would create two CTEs (for likes and comments) because you have multiple one-to-many joins. – Isolated Sep 08 '22 at 17:29
  • Debug questions require a [mre]--cut & paste & runnable code including initialization; desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. For SQL include DDL & tabular initialization code. For debug that includes the least code you can give that is code that you show is OK extended by code that you show is not OK. [ask] [Help] When you get a result you don't expect, pause your overall goal, chop to the 1st subexpression with unexpected result & say what you expected & why, justified by documentation. PS Please use standard spelling. – philipxy Sep 09 '22 at 20:59
  • [Two SQL LEFT JOINS produce incorrect result](https://stackoverflow.com/q/12464037/3404097) – philipxy Sep 09 '22 at 21:01
  • Please before considering posting: Pin down code issues via [mre]. Read the manual/reference & google any error message & many clear, concise & precise phrasings of your question/problem/goal, with & without your particular names/strings/numbers, 'site:stackoverflow.com' & tags; read many answers. Reflect research in posts. SO/SE search is poor & literal & unusual, read the help. Google re googling/searching, including Q&A at [meta] & [meta.se]. [How much research effort is expected of Stack Overflow users?](https://meta.stackoverflow.com/q/261592/3404097) – philipxy Sep 09 '22 at 21:02

2 Answers2

4

Because you have one-to-many, I would use two CTEs to get your aggregate counts. You'll need to coalesce because some counts may be zero, and you cannot add a NULL with an integer.

Schema (PostgreSQL v13)

create table posts (
  post_id integer, 
  title varchar(20)
  );

insert into posts values 
(101, 'some title'),
(102, 'hello there'),
(103, 'good day sir'),
(104, 'good bye'),    
(105, 'whats up');

create table likes (
  like_id integer, 
  post_id integer
  );
  
insert into likes values 
(1, 101),
(2, 101),
(3, 101),
(4, 102),
(5, 102),
(6, 104),
(7, 104),
(8, 105),
(9, 105),
(10, 101);

create table comments (
  comment_id integer, 
  post_id integer
  );
  
insert into comments values 
(1, 103),
(2, 103),
(3, 103),
(4, 101),
(5, 102),
(6, 104),
(7, 105),
(8, 105),
(9, 105),
(10, 103);

create table post_read_count (
  post_id integer, 
  pcount integer
  );
  
insert into post_read_count values 
(101, 12),
(102, 54),
(103, 76),
(104, 23),      
(105, 87);

Query #1

with cte_likes as (
  select post_id, count(*) as total_likes
  from likes
  group by post_id
  ), 
  cte_comments as (
    select post_id, count(*) as total_comments
    from comments
    group by post_id
    )
select p.post_id, 
  p.title,
  coalesce(l.total_likes, 0) as likes, 
  coalesce(c.total_comments, 0) as comments, 
  coalesce(prc.pcount, 0) as reads, 
  coalesce(l.total_likes, 0) + coalesce(c.total_comments, 0) + coalesce(prc.pcount, 0) as score
from posts p
left join cte_likes l
  on p.post_id = l.post_id
left join cte_comments c
  on p.post_id = c.post_id
left join post_read_count prc
  on p.post_id = prc.post_id;
post_id title likes comments reads score
101 some title 4 1 12 17
102 hello there 2 1 54 57
103 good day sir 0 4 76 80
104 good bye 2 1 23 26
105 whats up 2 3 87 92

View on DB Fiddle

Isolated
  • 5,169
  • 1
  • 6
  • 18
0

This trick is the extra joins multiply the results for each additional join. It helps if we think through what the engine does to build the results.

Let's say you run this when only have one post, but that post has received 3 likes and 2 comments. After the first join from posts to likes, and before the group by, the result set so far has 3 rows:

post_id like_id
1 1
1 2
1 3

Next we join to comments. This join starts with the set we built previous, not from the base posts table. So each of the three rows in the previous join adds it's own set of comments. We end up with this:

post_id like_id comment_id
1 1 1
1 1 2
1 2 1
1 2 2
1 3 1
1 3 2

You can see how each of these builds up more rows exponentially. But we're not done yet. There's still a GROUP BY operation to handle. So now we group by post_id... but when we do, we're grouping the entire set from the previous step, not just posts table. Taking the COUNT(like_id) within this group, anything that's not null still counts. You count the 1 value two times (once for each like); you count the 2 value three times; and you count the 3 value three times, because they are not null and still there as part of the result set up to that point. It's similar for the comments: you count the 1 and the 2 comments three times each (once for each like).

Thankfully, post_read_count looks like it would be 1:1 or 1:0 with the posts table, so it won't add to the issue, and with that in mind, for this query, most of the issue can be resolved by adding a distinct to the aggregate functions:

SELECT p.post_id, p.title,
       count(distinct l.like_id) as likes,
       count(distinct c.comment_id) as comments,
       r.count as reads,
       (count(distinct l.like_id) + count(distinct c.comment_id) + r.count) as score
FROM community.posts p
LEFT JOIN community.post_read_count r ON r.post_id = p.post_id
LEFT JOIN community.likes l ON l.post_id = p.post_id
LEFT JOIN community.comments c ON c.post_id = p.post_id
WHERE p.is_deleted = false
GROUP BY p.post_id, p.title, r.count
ORDER BY p.post_id DESC
LIMIT 100

See it here, with appreciation to @isolated saving me having to build the DDL script myself:

https://www.db-fiddle.com/f/oeHccomPBVsskpVa3ftigN/0

This ought to be significantly more efficient than the CTE-based solution. However, some queries will require you to take more drastic steps than just adding distinct to the aggregate function, either nesting subqueries or using CTEs to compute the aggregate values in isolation, so they don't compound with rows from other tables. In that situation, understanding how the other answer works will be important.

Also note how I used table alias mnemonics and included ALL the non-aggregated functions as part of the GROUP BY. Both of those are good practices you should be doing in pretty much every query.

Finally, a complete side issue here: it seems very strange to weight a read the same as a like or comment.

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794