Suppose I have an ordered list of frequencies, eg [1, 1, 2]
. I want to be able to quickly sample from this list, with the probability of selecting an option being proportional to its value.
The Alias method lets us do this sampling with O(n) construction time and O(1) query time. I'm interested in the version of this problem where we also support updates or insertions to this list.
Here are some ideas:
- An augmented BST can do this with all operations taking O(log(n))
- You can use a tiered data structure, where you have n/log(n) blocks of size log(n). Each block is stored as an augmented BST, and the top tier data structure is the alias method. Choosing a BST takes O(1) and choosing within it takes O(log(log(n))), so your query time is O(log(log(n))). And whenever an update happens, you need to totally rebuild the top level structure, which takes O(n/log(n)) time.
Are there any faster solutions?