SQL count distinct per group divided by count distinct of total

Question

I have:

id	value
1	123
1	124
1	125
2	126
2	127
2	127
3	128
3	128
3	128

I want an aggregation like:

id	distinct_count	total_distinct	percentage
1	3	6	0.5
2	2	6	0.33
3	1	6	0.167

I tried applying a window over clause like this:

SELECT id,
       COUNT(DISTINCT value) AS distinct_count,
       COUNT(DISTINCT value) OVER () AS total_distinct,
       COUNT(DISTINCT value) / COUNT(DISTINCT value) OVER () AS percentage
FROM have
GROUP BY id

but it seems it is not implemented yet.

is there a way to achieve this without a join?

i got an error.. DISTINCT in window function parameters not yet supported — Grizzly2501, May 09 '21 at 13:48
see this SO as an example of SUM aggregation that works: https://stackoverflow.com/questions/46909494/percentage-from-total-sum-after-group-by-sql-server — Grizzly2501, May 09 '21 at 13:50
Why the question? If you can find an answer yourself within 10 minutes ? — Luuk, May 09 '21 at 13:53
not sure what you mean @Luuk . i can't find the answer, hence the q — Grizzly2501, May 09 '21 at 13:56
Is the link you just send not an answer to your question ? Please clarify — Luuk, May 09 '21 at 14:01
the link shows how you would do this for SUM aggregation. it does not work for COUNT(DISTINCT ...) because it is not implemented as yet — Grizzly2501, May 09 '21 at 14:08

score 1 · Accepted Answer · answered May 09 '21 at 14:14

You can do this:

SELECT id,
       COUNT(DISTINCT value) AS distinct_count,
       (SELECT COUNT(DISTINCT value) FROM have) AS total_distinct,
       (0.0+COUNT(DISTINCT value)) / (SELECT COUNT(DISTINCT value) FROM have) AS percentage
FROM have
GROUP BY id

or do:

WITH cte AS (SELECT COUNT(DISTINCT value) AS value FROM have)
SELECT 
       id,
       COUNT(DISTINCT value) AS distinct_count,
       cte.value AS total_distinct,
       (0.0+COUNT(DISTINCT value)) / cte.value AS percentage
FROM have
CROSS APPLY cte
GROUP By cte.value,id;

Thanks! Was too focused on OVER clause for some reason – Grizzly2501 May 09 '21 at 14:20 — Grizzly2501, May 09 '21 at 14:20

Gordon Linoff · Answer 2 · 2021-05-09T15:22:08.367

1

An alternative method is to enumerate the values and use conditional aggregation:

SELECT id,
       SUM(CASE WHEN seqnum_iv = 1 THEN 1 ELSE 0 END) as distinct_count,
       SUM(CASE WHEN seqnum_v = 1 THEN 1 ELSE 0 END) as total_distinct_count,
       (SUM(CASE WHEN seqnum_iv = 1 THEN 1.0 ELSE 0 END) /
        SUM(CASE WHEN seqnum_v = 1 THEN 1.0 ELSE 0 END)
       ) as ratio
FROM (SELECT h.*,
             ROW_NUMBER() OVER (PARTITION BY id, value ORDER BY value) as seqnum_iv,
             ROW_NUMBER() OVER (PARTITION BY value ORDER BY value) as seqnum_v
      FROM have h
     ) h
GROUP BY id;

This may be faster than an approach using subqueries.

edited May 09 '21 at 15:22

answered May 09 '21 at 14:45

Gordon Linoff

1,242,037
58
646
786

Thank you, I'll defs compare runtime seeing as it is on large'ish transactional data. – Grizzly2501 May 09 '21 at 15:14
@Grizzly2501 . . . I missed something the first time I answered. I have fixed the answer. – Gordon Linoff May 09 '21 at 15:22

SQL count distinct per group divided by count distinct of total

2 Answers2