About the query
Inspired by @a_horse's comment I ran a quick test with a real life table for a similar purpose.
- 80 categories
- 6862 items
- optimal indexes (on
items.catID
. There are more, but not useful here.)
Three candidates. Results are identical. Query plans and performance vary.
1) Subselect per category (Seregwethrin's original)
Total runtime: 20.351 ms
SELECT c.*
,(SELECT COUNT(*) FROM items i WHERE i.catid = c.id) AS item_ct
FROM category c
2) LEFT JOIN, then GROUP
Total runtime: 36.320 ms
SELECT c.*
,count(*) AS item_ct
FROM category c
LEFT JOIN items i ON i.catid = c.id
GROUP BY c.catid; -- prim. key of category
3) GROUP, then LEFT JOIN
Total runtime: 18.588 ms
SELECT c.*
,item_ct
FROM category c
LEFT JOIN (
SELECT catid
,count(*) AS item_ct
FROM items
GROUP BY catid
) i ON i.catid = c.id
So, my first suggestion wasn't any good. As expected (after some thought), version 3) performs best. It makes sense, too: If you count first, and then JOIN, fewer join operations are necessary.
The difference in performance will become more distinct for bigger tables, especially with more categories.
For 2) to work you need PostgreSQL 9.1 and category.id
needs to be the primary key.
For older versions you would have to list all non-aggregated columns in the GROUP BY clause.
I switched to LEFT JOIN
generally, because the original query includes categories without associated items.
The index on items.catID
is only used by 1) where multiple subqueries can profit. In the other queries sequential scans are faster: all of both tables has to be read anyway.