5

I have 3 tables, with Table B & C referencing Table A via Foreign Key. I want to write a query in PostgreSQL to get all ids from A and also their total occurrences from B & C.

   a      |     b      |     c
-----------------------------------    
id | txt  |  id | a_id |  id | a_id  
---+----  |  ---+----- |  ---+------ 
1  |  a   |  1  |  1   |  1  |  3    
2  |  b   |  2  |  1   |  2  |  4    
3  |  c   |  3  |  3   |  3  |  4    
4  |  d   |  4  |  4   |  4  |  4    

Output desired (just the id from A & total count in B & C) :

id | Count
---+-------  
1  |  2      -- twice in B
2  |  0      -- occurs nowhere
3  |  2      -- once in B & once in C
4  |  4      -- once in B & thrice in C

SQL so far SQL Fiddle :

SELECT a_id, COUNT(a_id)
FROM
( SELECT a_id FROM b
  UNION ALL 
  SELECT a_id FROM c
) AS union_table
GROUP BY a_id

The query I wrote fetches from B & C and counts the occurrences. But if the key doesn't occur in B or C, it doesn't show up in the output (e.g. id=2 in output). How can I start my selection from table A & join/union B & C to get the desired output

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
click
  • 2,093
  • 2
  • 15
  • 21

3 Answers3

5

If the query involves large parts of b and / or c it is more efficient to aggregate first and join later.
I expect these two variants to be considerably faster:

SELECT a.id,
     , COALESCE(b.ct, 0) + COALESCE(c.ct, 0) AS bc_ct
FROM   a
LEFT   JOIN (SELECT a_id, count(*) AS ct FROM b GROUP BY 1) b USING (a_id)
LEFT   JOIN (SELECT a_id, count(*) AS ct FROM c GROUP BY 1) c USING (a_id);

You need to account for the possibility that some a_id are not present at all in a and / or b. count() never returns NULL, but that's cold comfort in the face of LEFT JOIN, which leaves you with NULL values for missing rows nonetheless. You must prepare for NULL. Use COALESCE().

Or UNION ALL a_id from both tables, aggregate, then JOIN:

SELECT a.id
     , COALESCE(ct.bc_ct, 0) AS bc_ct
FROM   a
LEFT   JOIN (
   SELECT a_id, count(*) AS bc_ct
   FROM (
      SELECT a_id FROM b
      UNION ALL
      SELECT a_id FROM c
      ) bc
   GROUP  BY 1
   ) ct USING (a_id);

Probably slower. But still faster than solutions presented so far. And you could do without COALESCE() and still not loose any rows. You might get occasional NULL values for bc_ct, in this case.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Why do you say your 2nd option is `probably slower`? It has less JOINS, shouldn't it be faster? – click Jul 15 '14 at 07:08
  • @click I'm not an expert with PostgreSQL (far from it), but probably using a aggregate function like `GROUP BY` will indeed be slower. But thinking better about it, I don't really get why Erwin's first query is should be faster than my answer (I believe both would have similar execution plans in databases like Oracle and SQL Server), but I *guess* he knows what he is talking about. – rsenna Jul 15 '14 at 13:16
  • @click: the number of joins is just one of many factors. It all depends on cardinalities and data distribution (among others). That's why I talk about typical cases and probabilities. You have to test with your *actual* scenario to know for sure. The first query typically hits a sweet spot between simplicity and effectiveness. – Erwin Brandstetter Jul 15 '14 at 17:18
3

Another option:

SELECT
    a.id,
    (SELECT COUNT(*) FROM b WHERE b.a_id = a.id) +
    (SELECT COUNT(*) FROM c WHERE c.a_id = a.id)
FROM
    a
rsenna
  • 11,775
  • 1
  • 54
  • 60
  • This is neat. I didn't know you can have select statements inside select & add them simply. – click Jul 14 '14 at 20:26
  • 3
    @click: For more than a handful of `a_id` values, **[correlated subqueries](http://en.wikipedia.org/wiki/Correlated_subquery)** (that's the term) are typically *slower* than alternative solutions. Postgres needs to run one subquery per `a_id` value. Be sure to test with a realistic set of data. – Erwin Brandstetter Jul 14 '14 at 23:29
  • Changed the accepted answer to one from @ErwinBrandstetter for performance. But for smaller tables, this wins for compactness. I'll visit this again when I have more data to compare. – click Jul 15 '14 at 07:16
  • Yes, I'm pretty sure Erwin is right. I was not concerned about giving a faster (or even *better*) answer, just an alternative way of getting the same result set. – rsenna Jul 15 '14 at 13:00
2

Use left join with a subquery:

SELECT a.id, COUNT(x.id)
FROM a
LEFT JOIN (
    SELECT id, a_id FROM b
    UNION ALL
    SELECT id, a_id FROM c
) x ON (a.id = x.a_id)
GROUP BY a.id;
tsnorri
  • 1,966
  • 5
  • 21
  • 29
  • Wish I could accept both the answers. When I've more data, I'll come & revisit this to find which performs better but it appears that the other one avoids JOIN and might be faster. – click Jul 14 '14 at 20:43
  • There is even a third possible solution. – wildplasser Jul 14 '14 at 22:14