0

I have looked at the following example posts (amongst many) and none fit my requirements unfortunately:

Here is the description of the problem.

We have a list of key/value elements (I want to avoid using the term 'dictionary' for now), sorted by value, where the 'value' of any given element can change at any time and it changes frequently. Once any given value changes, we need to sort the list of elements by value again. Once the re-sort is done we need to know the index of each element in the list BY KEY to calculate each that element's PERCENTILE.

In other words, we need to calculate the PERCENTILE of each element in a LARGE sorted (by value) list of key/value elements where the values change rapidly.

Obviously there is a 'naive' way of doing this, however, considering the number of changes in values, the large number of key/value pairs, the re-sorting, and the calculation of percentiles each and every time, this will not be functional.

Does anyone have a very fast algorithm (or heuristic) that does this?

Community
  • 1
  • 1
MoMo
  • 1,836
  • 1
  • 21
  • 38
  • When an element changes, you don't have to sort the entire list, just relocate that element to its new place. Percentiles only change for elements between its old and new positions. – Barmar Dec 06 '13 at 17:13
  • Have you looked into a priority queue data structure? – Bruce Dean Dec 06 '13 at 17:15
  • And the percentiles all change by a fixed amount: `1/N` where N is the total number of elements. Except for elements that were tied with either the old value or new value. – Barmar Dec 06 '13 at 17:16
  • I'm not sure what you mean by percentile here. If you have 100 items in your list, sorted by value, is the percentile for the 10th item equal to 10? If I change the value and re-sort so that the item is now at position 52, is the percentile equal to 52? – Jim Mischel Dec 06 '13 at 17:22
  • @JimMischel, Jim, that is correct. Percentile is a simple function of sort position for us. – MoMo Dec 06 '13 at 17:25
  • @Barmar, yes that is correct we don't need to re-cal the percentiles for all elements. Just the elements between the old and new position of the changed value. The problem is not optimizing the percentile calculation, however. The problem is what type of data structure or combination of data structures allows us to do this optimally? – MoMo Dec 06 '13 at 17:29
  • You can insert into a sorted array using O(log n) binary search. – Barmar Dec 06 '13 at 17:31
  • Just out of curiosity, how do you know when a value changes? Do you receive a signal? – Abhishek Bansal Dec 06 '13 at 17:54
  • @Barmar: Insertion into a sorted array is O(n) because you have to move things down in order to make room for the item you're inserting. If you moved an item from position 1 in the array to position 100, you'd have to move items 2 through 100 down one place to make room. – Jim Mischel Dec 06 '13 at 18:20
  • @AbhishekBansal Yes, we receive a signal. Someone takes action on our site and the value changes accordingly. There are a large number of actions and large number of people. – MoMo Dec 06 '13 at 19:45

1 Answers1

0

One approach is to use a hashtable + a balanced order statistics tree.

The hashtable takes the Key, and maps to a node of the tree.

The tree can do stuff like change value, get rank of node etc in O(log n) time.

W. Hoare.
  • 39
  • 1