Cumulative count of duplicates

Question

For a table looking like

ID    | Value
-------------
1     | 2
2     | 10
3     | 3
4     | 2
5     | 0
6     | 3
7     | 3

I would like to calculate the number of IDs with a higher Value, for each Value that appears in the table, i.e.

Value | Position
----------------
10    | 0
3     | 1
2     | 4
0     | 6

This equates to the offset of the Value in a ORDER BY Value ordering.

I have considered doing this by calculating the number of duplicates with something like

SELECT Value, count(*) AS ct FROM table GROUP BY Value";

And then cumulating the result, but I guess that is not the optimal way to do it (nor have I managed to combine the commands accordingly)

How would one go about calculating this efficiently (for several dozens of thousands of rows)?

score 3 · Accepted Answer · edited May 23 '17 at 10:26

3

This seems like a perfect opportunity for the window function rank() (not the related dense_rank()):

SELECT DISTINCT ON (value)
       value, rank() OVER (ORDER BY value DESC) - 1 AS position
FROM   tbl
ORDER  BY value DESC;

rank() starts with 1, while your count starts with 0, so subtract 1.

Adding a DISTINCT step (DISTINCT ON is slightly cheaper here) to remove duplicate rows (after computing counting ranks). DISTINCT is applied after window functions. Details in this related answer:

Best way to get result count before LIMIT was applied

Result exactly as requested.
An index on value will help performance.

SQL Fiddle.

edited May 23 '17 at 10:26

Community

1
1

answered Nov 13 '14 at 00:25

Erwin Brandstetter

605,456
145
1,078
1,228

1

Incredible. I would not have thought that command/function exists! Thank you very much. The rank() is even better for what I am attempting it turned out, so everything is great! – CBenni Nov 13 '14 at 09:09

score 1 · Answer 2 · answered Nov 13 '14 at 03:55

1

You might also try this if you're not comfortable with window functions:

SELECT t1.value, COUNT(DISTINCT t2.id) AS position
  FROM tbl t1 LEFT OUTER JOIN tbl t2
    ON t1.value < t2.value
 GROUP BY t1.value

Note the self-join.

answered Nov 13 '14 at 03:55

David Faber

12,277
2
29
40

Cumulative count of duplicates

2 Answers2