postgresql group count distinct using fast way

Question

I got table T with 2 column as follow example:

C1      C2
----------
A       x
A       x
A       y
B       x
B       x

I want to count number of distinct C1 for each value in C2. This result should be like:

C1      distinct count
----------------------
A       2               // count distinct x,x,y = 2
B       1               // count distinct x,x = 1

it is easy to come out with a SQL query like this

select C1, count(distinct C2) from T group by C1

however, as discussed in postgresql COUNT(DISTINCT …) very slow, this query yield poor performance. I would like to use the improved query (count (*) (select distinct ...)) as suggested in that article but I don't know how to form the query with group by.

Adrian Hartanto · Accepted Answer · 2017-07-07T07:31:39.110

6

Try this query if you want avoid DISTINCT keyword

Sample Data:

stackoverflow=# select * from T;
 c1 | c2 
----+----
 A  | x
 A  | x
 A  | y
 B  | x
 B  | x
(5 rows)

Query:

stackoverflow=# WITH count_distinct as (SELECT C1 FROM T GROUP BY c1,c2)
SELECT c1,count(c1) FROM count_distinct GROUP BY C1;  --updated missing group by

Output:

 c1 | count 
----+-------
 B  |     1
 A  |     2
(2 rows)

Same output, but you should try the performance first.

edited Jul 07 '17 at 07:31

answered Jul 07 '17 at 04:51

Adrian Hartanto

435
2
7

2

You missed `GROUP BY c1` for second query – Oto Shavadze Jul 07 '17 at 06:44

postgresql group count distinct using fast way

1 Answers1