0

I have:

id value
1 123
1 124
1 125
2 126
2 127
2 127
3 128
3 128
3 128

I want an aggregation like:

id distinct_count total_distinct percentage
1 3 6 0.5
2 2 6 0.33
3 1 6 0.167

I tried applying a window over clause like this:

SELECT id,
       COUNT(DISTINCT value) AS distinct_count,
       COUNT(DISTINCT value) OVER () AS total_distinct,
       COUNT(DISTINCT value) / COUNT(DISTINCT value) OVER () AS percentage
FROM have
GROUP BY id

but it seems it is not implemented yet.

is there a way to achieve this without a join?

Grizzly2501
  • 113
  • 1
  • 3
  • 10

2 Answers2

1

You can do this:

SELECT id,
       COUNT(DISTINCT value) AS distinct_count,
       (SELECT COUNT(DISTINCT value) FROM have) AS total_distinct,
       (0.0+COUNT(DISTINCT value)) / (SELECT COUNT(DISTINCT value) FROM have) AS percentage
FROM have
GROUP BY id

or do:

WITH cte AS (SELECT COUNT(DISTINCT value) AS value FROM have)
SELECT 
       id,
       COUNT(DISTINCT value) AS distinct_count,
       cte.value AS total_distinct,
       (0.0+COUNT(DISTINCT value)) / cte.value AS percentage
FROM have
CROSS APPLY cte
GROUP By cte.value,id;
Luuk
  • 12,245
  • 5
  • 22
  • 33
1

An alternative method is to enumerate the values and use conditional aggregation:

SELECT id,
       SUM(CASE WHEN seqnum_iv = 1 THEN 1 ELSE 0 END) as distinct_count,
       SUM(CASE WHEN seqnum_v = 1 THEN 1 ELSE 0 END) as total_distinct_count,
       (SUM(CASE WHEN seqnum_iv = 1 THEN 1.0 ELSE 0 END) /
        SUM(CASE WHEN seqnum_v = 1 THEN 1.0 ELSE 0 END)
       ) as ratio
FROM (SELECT h.*,
             ROW_NUMBER() OVER (PARTITION BY id, value ORDER BY value) as seqnum_iv,
             ROW_NUMBER() OVER (PARTITION BY value ORDER BY value) as seqnum_v
      FROM have h
     ) h
GROUP BY id;

This may be faster than an approach using subqueries.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786