1

I have the following items table:

items:
id    pr1  pr2  pr3
-------------------
1     11   22   tt
...

and two tables associated with the items:

comments:
item_id  text
-------------
1       "cool"
1       "very good"
...

tags:
item_id  tag
-------------
1        "life"
1        "drug"
...

Now I want to get a table with columns item_id, pr1, pr2, count(comments), count(tags) with a condition WHERE pr3 = zz. What is the best way to get it? I can do this by creating additional tables, but I was wondering if there is a way achieve this by using only a single SQL statement. I'm using Postgres 9.3.

hovnatan
  • 1,331
  • 10
  • 23
  • Yes look for inner join between these three tables. – SMA Dec 22 '14 at 10:34
  • @Saharsh Shah: I removed your tags again. This is not a question about left joins etc., but about how to write an **SQL** query in **PostgreSQL**, as hovo tagged correctly. Yes, outer joining the tables is one solution but not the only one. – Thorsten Kettner Dec 22 '14 at 11:32
  • The most efficient query depends on whether you want to retrieve all or most items at once or just a small selection or a single item. You would have to clarify your question in this respect. Add your version of Postgres while being at it. – Erwin Brandstetter Dec 22 '14 at 12:55
  • There are other ways to solve this, but `[left-join]` is still an appropriate tag. – Erwin Brandstetter Dec 22 '14 at 13:32
  • @ErwinBrandstetter I just clarified my question. Thanks. – hovnatan Dec 22 '14 at 13:53
  • 1
    For the clarified question, @Thorsten's query is probably faster than the one from @mlinth (both correct). Test with `EXPLAIN ANALYZE` (a couple of times each to rule out caching artifacts). – Erwin Brandstetter Dec 22 '14 at 14:05

4 Answers4

2

You can just join, but you need to be careful that you don't get double count. E.g. you can use a subqueries to get what you want.

SELECT i.id,i.pr1,i.pr2, commentcount,tagcount FROM
 items i
INNER JOIN
    (SELECT item_id,count(*) as commentcount from comments GROUP BY item_id) c
ON i.id = c.item_id
INNER JOIN
    (SELECT item_id,count(*) as tagcount from tags GROUP BY item_id) t
ON i.id = t.item_id

[EDIT] based on the comment, here's the left join version...

SELECT i.id,i.pr1,i.pr2, coalesce(commentcount,0) as commentcount,
      coalesce(tagcount,0) as tagcount FROM
     items i
    LEFT JOIN
        (SELECT item_id,count(*) as commentcount from comments GROUP BY item_id) c
    ON i.id = c.item_id
    LEFT JOIN
        (SELECT item_id,count(*) as tagcount from tags GROUP BY item_id) t
    ON i.id = t.item_id
mlinth
  • 2,968
  • 6
  • 30
  • 30
  • 1
    This is a good answer. Make these outer joins and access the counts with `coalesce(commentcount,0)` and `coalesce(tagcount,0)`, so you also get items with zero comments or tags. – Thorsten Kettner Dec 22 '14 at 10:56
  • I tried this query. It works but it is really slow on Postgres 9.3. I think the cause of the problem is in the fact I didn't mention in the question: there is a `WHERE i.pr3 = smth` clause in the end of the query. In the case of your query the bracketed selections in this query evaluate on all `i.item_id`s instead of all small set of them. So a query where you just `LEFT JOIN` on all three tables then calculate count numbers runs much faster. I don't know why Postgres cannot optimize this. – hovnatan Dec 22 '14 at 13:12
2

The easiest way is certainly to get the counts in the select clause:

select 
  id, 
  pr1, 
  pr2,
  (select count(*) from comments where item_id = items.id) as comment_count,
  (select count(*) from tags where item_id = items.id) as tag_count
from items;
Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
  • Why would you do that? That is a SELECT for each row (actually two), rather than giving the optimizer the best possible chance to consolidate things. – davek Dec 22 '14 at 10:42
  • @davek: I would do this for readability and maintainabilty. You get to the counts directly without having to think about side effects (i.e. counting multi-fold), which easily happens, as you can see from your own answer. Moreover the optimizer *can* decide to join the tables to get the result. This is what SQL is all about; you say *what* you want to see and the dbms decides *how* to get that data effectively. – Thorsten Kettner Dec 22 '14 at 10:52
  • @Thorsten: good points. However, not all databases are equal and MySQL, for example, is not always capable of doing that (optimizing on the basis of SQL), which is a shame as it compromises SQL's declarativity. – davek Dec 22 '14 at 11:06
  • Correlated subqueries are fast for *few* items. For lots of items, joining to pre-aggregated tables is faster. – Erwin Brandstetter Dec 22 '14 at 13:07
1

Try this:

SELECT i.id, i.pr1, i.pr2, A.commentCount, B.tagCount 
FROM items i
LEFT OUTER JOIN (SELECT item_id, COUNT(1) AS commentCount 
                 FROM comments 
                 GROUP BY item_id
                ) AS A ON i.id = A.item_id
LEFT OUTER JOIN (SELECT item_id, count(1) as tagCount 
                 FROM tags 
                 GROUP BY item_id
                ) AS B ON i.id = B.item_id;
Saharsh Shah
  • 28,687
  • 8
  • 48
  • 83
-1
select

  i.id
, i.pr1
, i.pr2
, count(c.item_id) as count_comments
, count(t.item_id) as count_tags

from items i
left outer join comments c on i.id = c.item_id
left outer join tags t on i.id = t.item_id
group by i.id, i.pr1, i.pr2

I've used a LEFT OUTER JOIN to also return counts of zero.

davek
  • 22,499
  • 9
  • 75
  • 95
  • This query will not run because pr1 and pr2 are not in the group by expression and will also return the wrong results, because count(distinct(item_id) will be 1... Just tried it out... – mlinth Dec 22 '14 at 10:44
  • @davek your query returns either `count_comments` and `count_tags` equal to each other and non-zero, or one of them zero. It seems wrong. – hovnatan Dec 22 '14 at 10:59
  • 1
    @hovo: Yes the query is wrong. In order to have it working correctly, you would count distinct comments and tags instead (`count(distinct c.text)`and `count(distinct t.tag)`), provided they are unique for an item - or add IDs to the tables and count these. – Thorsten Kettner Dec 22 '14 at 11:02
  • 1
    @ThorstenKettner: `count(DISTINCT ...)` could only fix the incorrect query if you have distinct values for `text` and `tag`, which is not specified here. [Detailed explanation.](http://stackoverflow.com/questions/12464037/two-sql-left-joins-produce-incorrect-result/12464135#12464135) – Erwin Brandstetter Dec 22 '14 at 13:02
  • @Erwin Brandstetter: Exactly. Which is why I wrote " provided they are unique for an item". – Thorsten Kettner Dec 22 '14 at 13:05