Joining three tables with aggregation

Question

I have the following items table:

items:
id    pr1  pr2  pr3
-------------------
1     11   22   tt
...

and two tables associated with the items:

comments:
item_id  text
-------------
1       "cool"
1       "very good"
...

tags:
item_id  tag
-------------
1        "life"
1        "drug"
...

Now I want to get a table with columns item_id, pr1, pr2, count(comments), count(tags) with a condition WHERE pr3 = zz. What is the best way to get it? I can do this by creating additional tables, but I was wondering if there is a way achieve this by using only a single SQL statement. I'm using Postgres 9.3.

@Saharsh Shah: I removed your tags again. This is not a question about left joins etc., but about how to write an **SQL** query in **PostgreSQL**, as hovo tagged correctly. Yes, outer joining the tables is one solution but not the only one. — Thorsten Kettner, Dec 22 '14 at 11:32
The most efficient query depends on whether you want to retrieve all or most items at once or just a small selection or a single item. You would have to clarify your question in this respect. Add your version of Postgres while being at it. — Erwin Brandstetter, Dec 22 '14 at 12:55
There are other ways to solve this, but `[left-join]` is still an appropriate tag. — Erwin Brandstetter, Dec 22 '14 at 13:32
For the clarified question, @Thorsten's query is probably faster than the one from @mlinth (both correct). Test with `EXPLAIN ANALYZE` (a couple of times each to rule out caching artifacts). — Erwin Brandstetter, Dec 22 '14 at 14:05

mlinth · Answer 1 · 2014-12-22T11:00:40.177

2

You can just join, but you need to be careful that you don't get double count. E.g. you can use a subqueries to get what you want.

SELECT i.id,i.pr1,i.pr2, commentcount,tagcount FROM
 items i
INNER JOIN
    (SELECT item_id,count(*) as commentcount from comments GROUP BY item_id) c
ON i.id = c.item_id
INNER JOIN
    (SELECT item_id,count(*) as tagcount from tags GROUP BY item_id) t
ON i.id = t.item_id

[EDIT] based on the comment, here's the left join version...

SELECT i.id,i.pr1,i.pr2, coalesce(commentcount,0) as commentcount,
      coalesce(tagcount,0) as tagcount FROM
     items i
    LEFT JOIN
        (SELECT item_id,count(*) as commentcount from comments GROUP BY item_id) c
    ON i.id = c.item_id
    LEFT JOIN
        (SELECT item_id,count(*) as tagcount from tags GROUP BY item_id) t
    ON i.id = t.item_id

edited Dec 22 '14 at 11:00

answered Dec 22 '14 at 10:36

mlinth

2,968
6
30
30

1

This is a good answer. Make these outer joins and access the counts with `coalesce(commentcount,0)` and `coalesce(tagcount,0)`, so you also get items with zero comments or tags. – Thorsten Kettner Dec 22 '14 at 10:56
I tried this query. It works but it is really slow on Postgres 9.3. I think the cause of the problem is in the fact I didn't mention in the question: there is a `WHERE i.pr3 = smth` clause in the end of the query. In the case of your query the bracketed selections in this query evaluate on all `i.item_id`s instead of all small set of them. So a query where you just `LEFT JOIN` on all three tables then calculate count numbers runs much faster. I don't know why Postgres cannot optimize this. – hovnatan Dec 22 '14 at 13:12

Thorsten Kettner · Accepted Answer · 2014-12-22T10:50:22.067

2

The easiest way is certainly to get the counts in the select clause:

select 
  id, 
  pr1, 
  pr2,
  (select count(*) from comments where item_id = items.id) as comment_count,
  (select count(*) from tags where item_id = items.id) as tag_count
from items;

edited Dec 22 '14 at 10:50

answered Dec 22 '14 at 10:40

Thorsten Kettner

89,309
7
49
73

Why would you do that? That is a SELECT for each row (actually two), rather than giving the optimizer the best possible chance to consolidate things. – davek Dec 22 '14 at 10:42
@davek: I would do this for readability and maintainabilty. You get to the counts directly without having to think about side effects (i.e. counting multi-fold), which easily happens, as you can see from your own answer. Moreover the optimizer *can* decide to join the tables to get the result. This is what SQL is all about; you say *what* you want to see and the dbms decides *how* to get that data effectively. – Thorsten Kettner Dec 22 '14 at 10:52
@Thorsten: good points. However, not all databases are equal and MySQL, for example, is not always capable of doing that (optimizing on the basis of SQL), which is a shame as it compromises SQL's declarativity. – davek Dec 22 '14 at 11:06
Correlated subqueries are fast for *few* items. For lots of items, joining to pre-aggregated tables is faster. – Erwin Brandstetter Dec 22 '14 at 13:07

score 1 · Answer 3 · answered Dec 22 '14 at 11:02

Try this:

SELECT i.id, i.pr1, i.pr2, A.commentCount, B.tagCount 
FROM items i
LEFT OUTER JOIN (SELECT item_id, COUNT(1) AS commentCount 
                 FROM comments 
                 GROUP BY item_id
                ) AS A ON i.id = A.item_id
LEFT OUTER JOIN (SELECT item_id, count(1) as tagCount 
                 FROM tags 
                 GROUP BY item_id
                ) AS B ON i.id = B.item_id;

davek · Answer 4 · 2014-12-22T10:45:32.057

-1

select

  i.id
, i.pr1
, i.pr2
, count(c.item_id) as count_comments
, count(t.item_id) as count_tags

from items i
left outer join comments c on i.id = c.item_id
left outer join tags t on i.id = t.item_id
group by i.id, i.pr1, i.pr2

I've used a LEFT OUTER JOIN to also return counts of zero.

edited Dec 22 '14 at 10:45

answered Dec 22 '14 at 10:34

davek

22,499
9
75
95

This query will not run because pr1 and pr2 are not in the group by expression and will also return the wrong results, because count(distinct(item_id) will be 1... Just tried it out... – mlinth Dec 22 '14 at 10:44
@davek your query returns either `count_comments` and `count_tags` equal to each other and non-zero, or one of them zero. It seems wrong. – hovnatan Dec 22 '14 at 10:59
1

@hovo: Yes the query is wrong. In order to have it working correctly, you would count distinct comments and tags instead (`count(distinct c.text)`and `count(distinct t.tag)`), provided they are unique for an item - or add IDs to the tables and count these. – Thorsten Kettner Dec 22 '14 at 11:02
1

@ThorstenKettner: `count(DISTINCT ...)` could only fix the incorrect query if you have distinct values for `text` and `tag`, which is not specified here. [Detailed explanation.](http://stackoverflow.com/questions/12464037/two-sql-left-joins-produce-incorrect-result/12464135#12464135) – Erwin Brandstetter Dec 22 '14 at 13:02
@Erwin Brandstetter: Exactly. Which is why I wrote " provided they are unique for an item". – Thorsten Kettner Dec 22 '14 at 13:05

Joining three tables with aggregation

4 Answers4