4

I got table T with 2 column as follow example:

C1      C2
----------
A       x
A       x
A       y
B       x
B       x

I want to count number of distinct C1 for each value in C2. This result should be like:

C1      distinct count
----------------------
A       2               // count distinct x,x,y = 2
B       1               // count distinct x,x = 1

it is easy to come out with a SQL query like this

select C1, count(distinct C2) from T group by C1

however, as discussed in postgresql COUNT(DISTINCT …) very slow, this query yield poor performance. I would like to use the improved query (count (*) (select distinct ...)) as suggested in that article but I don't know how to form the query with group by.

Sylvain Bugat
  • 7,704
  • 3
  • 29
  • 30
user1817188
  • 487
  • 1
  • 7
  • 14

1 Answers1

6

Try this query if you want avoid DISTINCT keyword

Sample Data:

stackoverflow=# select * from T;
 c1 | c2 
----+----
 A  | x
 A  | x
 A  | y
 B  | x
 B  | x
(5 rows)

Query:

stackoverflow=# WITH count_distinct as (SELECT C1 FROM T GROUP BY c1,c2)
SELECT c1,count(c1) FROM count_distinct GROUP BY C1;  --updated missing group by

Output:

 c1 | count 
----+-------
 B  |     1
 A  |     2
(2 rows)

Same output, but you should try the performance first.

Adrian Hartanto
  • 435
  • 2
  • 7