Suppose that I have an n-sided loaded die, where each side k has some probability pk of coming up when I roll it. I’m curious if there is a good data structure for storing this information statically (i.e., for a fixed set of probabilities), so that I can efficiently simulate a random roll of the die.
Currently, I have an O(lg n) solution for this problem. The idea is to store a table of the cumulative probability of the first k sides for all k, then generate a random real number in the range [0, 1) and perform a binary search over the table to get the largest index whose cumulative value is no greater than the chosen value.
I rather like this solution, but it seems odd that the runtime doesn’t take the probabilities into account. In particular, in the extreme cases of one side always coming up or the values being uniformly distributed, it’s possible to generate the result of the roll in O(1) using a naive approach, while my solution will still take logarithmically many steps.
Does anyone have any suggestions for how to solve this problem in a way that is somehow “adaptive” in it’s runtime?
Update: Based on the answers to this question, I have written up an article describing many approaches to this problem, along with their analyses. It looks like Vose’s implementation of the alias method gives Θ(n) preprocessing time and O(1) time per die roll, which is truly impressive. Hopefully this is a useful addition to the information contained in the answers!