This also counts NULL values in g_n
(adds 1 if any):
SELECT count(*) -- count(*) counts *all* rows
FROM (SELECT DISTINCT ON (g_n) FROM tbl) sub;
The SELECT
list in the subquery can stay empty because count(*)
in the outer SELECT
does not consider row values anyway, only counts rows (which ist the reason why it is faster than count(expression)
).
It is also typically faster than the DISTINCT
clause in the aggregate expression:
SELECT count(DISTINCT g_n) -- count(expression) ignores NULL values
FROM tbl;
But to get a truly fast count of few distinct g_n
in a big table, emulate an index skip scan. A basic index does the job:
CREATE INDEX ON tbl (g_n);
Then:
WITH RECURSIVE t AS (
(SELECT g_n FROM tbl ORDER BY 1 LIMIT 1)
UNION ALL
SELECT (SELECT g_n FROM tbl WHERE g_n > t.g_n ORDER BY 1 LIMIT 1)
FROM t
WHERE t.col IS NOT NULL
)
SELECT count(g_n) FROM t; -- ignores NULL
To also count possible NULL, conditionally add 1:
...
SELECT count(g_n) + CASE WHEN EXISTS(SELECT FROM tbl WHERE g_n IS NULL) THEN 1 ELSE 0 END
FROM t;
Again, the CASE
expression with EXISTS
is substantially faster for big tables than aggregating like Gordon demonstrated: max((g_n is null)::int)
.
See: