How to count number of groups?

Question

I have a table as follow:

id    g_n   val_name  
---------------------
1      a      "a"
2      a      "aa"
3      b      "123"
4      b      "asad"
5      c      "aas1"

I want to count the number of distinct g_n. (Here it's 3.)

I have tried with:

select count(*)
from table_t
group by g_n

But it gives me the number of items per group.

How can I count the number of distinct g_n?

It is crucial to know whether NULL values can be involved and should be counted. — Erwin Brandstetter, Aug 25 '20 at 12:52

score 3 · Accepted Answer · answered Aug 25 '20 at 11:57

3

Use count(distinct):

select count(distinct g_n)
from table_t;

There is no need to aggregate using group by for this result.

Note: This ignores NULL values. If you want to count them as well, then:

select count(distinct g_n) + max( (g_n is null)::int )

answered Aug 25 '20 at 11:57

Gordon Linoff

1,242,037
58
646
786

1

Indeed, if you group by, you'll get 3 rows of 1,1,1.. Might be worth a note though that that counts all nulls as 1, whereas it could be interpreted that each null is different from other nulls – Caius Jard Aug 25 '20 at 11:59

Erwin Brandstetter · Answer 2 · 2020-08-25T13:48:33.223

This also counts NULL values in g_n (adds 1 if any):

SELECT count(*)             -- count(*) counts *all* rows
FROM  (SELECT DISTINCT ON (g_n) FROM tbl) sub;

The SELECT list in the subquery can stay empty because count(*) in the outer SELECT does not consider row values anyway, only counts rows (which ist the reason why it is faster than count(expression)).

It is also typically faster than the DISTINCT clause in the aggregate expression:

SELECT count(DISTINCT g_n)  -- count(expression) ignores NULL values
FROM   tbl;

But to get a truly fast count of few distinct g_n in a big table, emulate an index skip scan. A basic index does the job:

CREATE INDEX ON tbl (g_n);

Then:

WITH RECURSIVE t AS (
   (SELECT g_n FROM tbl ORDER BY 1 LIMIT 1)

   UNION ALL
   SELECT (SELECT g_n FROM tbl WHERE g_n > t.g_n ORDER BY 1 LIMIT 1)
   FROM   t
   WHERE  t.col IS NOT NULL
   )
SELECT count(g_n) FROM t;  -- ignores NULL

To also count possible NULL, conditionally add 1:

...
SELECT count(g_n) + CASE WHEN EXISTS(SELECT FROM tbl WHERE g_n IS NULL) THEN 1 ELSE 0 END
FROM   t;

Again, the CASE expression with EXISTS is substantially faster for big tables than aggregating like Gordon demonstrated: max((g_n is null)::int).

See:

How to count number of groups?

2 Answers2