1

I have a set of numbers, [1, 3, 4, 5, 7]. I want to select a number from the set with a probability proportional to its value:

Number Probability %
1 1/20 5
3 3/20 15
4 4/20 20
5 5/20 25
7 7/20 35

However, I want to be able to both update and query this set in less than O(n). Are there any data structures that would help me achieve that?

Preferably in Java, if it exists already

Rubydesic
  • 3,386
  • 12
  • 27
  • 1
    @kcsquared The first answer requires to compute the CDF which is an O(n) operation. The second one is better but I can't think of a way to allow removal of items that is not O(n) – Rubydesic Feb 09 '22 at 14:55
  • It appears that [this C++](https://stackoverflow.com/questions/58932354/weighted-random-number-generator-with-updates) based question is almost what you're looking for, apart from the language difference. You can get `O(log n)` amortized (or worst-case, with more work) inserts and deletes by dynamically/geometrically resizing a Fenwick tree. – kcsquared Feb 09 '22 at 15:02
  • Does this answer your question? [Random weighted selection in Java](https://stackoverflow.com/questions/6409652/random-weighted-selection-in-java) – QBrute Feb 09 '22 at 15:07
  • 1
    @QBrute kcsquared suggested that. See my first comment – Rubydesic Feb 09 '22 at 15:10

1 Answers1

1

You can get O(log n) amortized querying and updating (including removing/inserting elements) using a Binary Indexed Tree, also known as a Fenwick tree. The idea is to use dynamic resizing, which is the same trick used in variable-size arrays and hash tables to get amortized constant time appends. This also implies that you should be able to get O(log n) worst-case bounds using the method from dynamic arrays of rebuilding a second array on the side, but this makes the code significantly longer.

First, we know that given a list of the partial sums of arr, and a random integer in [0, sum(arr)], we can do this in O(log n) time with a binary search. Specifically, if our random integer is r, we want the index of the rightmost partial sum less than or equal to r.

Now, we'll use the technique from this post of Fenwick trees to maintain and query the partial sums. That post is slightly different from yours: they have a fixed set of n keys, whose weights can be updated, without new insertions or deletions.

A Fenwick tree is an array that allows you to answer queries about partial sums of a 'base' array in O(log n) time per query, and can be built in O(n) time. In particular, you can

  1. Find the the index of the rightmost partial sum of arr less than or equal to r,
  2. Set arr[i] to arr[i]+c for any integer c,

both in O(log n) time.

Start by appending n zeros to arr (it is now half full), and build its Fenwick tree. We can treat 'removing' an element as setting its weight to 0. Inserting an element is done by taking the zero after the rightmost nonzero element in arr as the new element's spot. The removed elements and new elements may eventually cause our array to fill up: if we reach 75% capacity, rebuild our array and Fenwick tree, doubling the array size (pad with zeros on the right) and deleting all the zero-weight elements. If we reach 25% capacity, shrink the array to half size, rebuilding the Fenwick tree as well.

You'll need to maintain arr constantly to be able to rebuild, so all updates must be done on arr and the Fenwick tree. You'll also need a hashmap from array indices to your keys for random selection.

The good part is that you don't need to modify the Fenwick tree internals at all: given a Fenwick tree implementation in Java that supports initialization, array updates and the binary search, you can treat it as a black box. This stops being true if you want worst-case time guarantees: then, you'll need to copy the internal state of the Fenwick tree, piece by piece, which has some complications.

kcsquared
  • 5,244
  • 1
  • 11
  • 36