How can I join multiple tables for count?

Question

I have 3 tables named houses, trees, rivers. All of these tables have city_id column. I want to group total counts by cities. Cities are in another table. My database is postgresql.

city_name    trees    houses      rivers
City-1       1000     200         1
City-2       300      100         2
City-3       4000     210         4

I can get for trees

SELECT 
    city.name as city_name, 
    count(*) as trees
FROM trees as t, cities as city
WHERE t.city_id = city.city_id
GROUP BY city.name

But I could not join three tables in sama query.

score 2 · Accepted Answer · answered Mar 13 '20 at 07:43

To avoid issues with duplication of rows in a JOIN it's probably easiest to do the aggregation in subqueries and then JOIN them:

SELECT c.name, 
       COALESCE(t.cnt, 0) AS trees,
       COALESCE(h.cnt, 0) AS houses,
       COALESCE(r.cnt, 0) AS rivers
FROM cities c
LEFT JOIN (SELECT city_id, COUNT(*) AS cnt
           FROM trees
           GROUP BY city_id) t ON t.city_id = c.city_id
LEFT JOIN (SELECT city_id, COUNT(*) AS cnt
           FROM houses
           GROUP BY city_id) h ON h.city_id = c.city_id
LEFT JOIN (SELECT city_id, COUNT(*) AS cnt
           FROM rivers
           GROUP BY city_id) r ON r.city_id = c.city_id

We use a LEFT JOIN in case a given city has no trees, houses or rivers.

Demo on dbfiddle

score 0 · Answer 2 · answered Mar 13 '20 at 08:02

0

An alternative to Nick's answer:

SELECT 
    city.name as city_name, 
    count(distinct t.id) as trees,
    count(distinct h.id) as houses,
    count(distinct r.id) as rivers
FROM  cities as city
left join trees as t on t.city_id = city.city_id
left join rivers as r on r.city_id = city.city_id
left join houses as h on h.city_id = city.city_id
GROUP BY city.name

--

Not sure of the performance implications specifically with Postgres, but here's a (fairly old, so things might have moved on since) article suggesting count(distinct) can be slow in Postgres, together with options:

postgresql COUNT(DISTINCT ...) very slow

answered Mar 13 '20 at 08:02

sellotape

8,034
2
26
30

. . It is not so much that `count(distinct)` is slow (which it is) but that the intermediate result might be so large. For instance, by some definition, the city I'm in has 5.2 million trees, 3 rivers, and something like 250,000 "houses". That generates a lot of intermediate data! – Gordon Linoff Mar 13 '20 at 11:45
@GordonLinoff - that's a lot of trees! But yes, granted - it's just an alternative (simpler) syntax that might make sense for smaller data sets. – sellotape Mar 13 '20 at 13:46

How can I join multiple tables for count?

2 Answers2