Data structures for loaded dice?

Question

Suppose that I have an n-sided loaded die, where each side k has some probability p_k of coming up when I roll it. I’m curious if there is a good data structure for storing this information statically (i.e., for a fixed set of probabilities), so that I can efficiently simulate a random roll of the die.

Currently, I have an O(lg n) solution for this problem. The idea is to store a table of the cumulative probability of the first k sides for all k, then generate a random real number in the range [0, 1) and perform a binary search over the table to get the largest index whose cumulative value is no greater than the chosen value.

I rather like this solution, but it seems odd that the runtime doesn’t take the probabilities into account. In particular, in the extreme cases of one side always coming up or the values being uniformly distributed, it’s possible to generate the result of the roll in O(1) using a naive approach, while my solution will still take logarithmically many steps.

Does anyone have any suggestions for how to solve this problem in a way that is somehow “adaptive” in it’s runtime?

Update: Based on the answers to this question, I have written up an article describing many approaches to this problem, along with their analyses. It looks like Vose’s implementation of the alias method gives Θ(n) preprocessing time and O(1) time per die roll, which is truly impressive. Hopefully this is a useful addition to the information contained in the answers!

It's reasonable that there exists a O(1) solution for _each specific case_. — Tim, Feb 17 '11 at 10:37

score 126 · Accepted Answer · edited Sep 12 '12 at 18:12

126

You are looking for the alias method which provides a O(1) method for generating a fixed discrete probability distribution (assuming you can access entries in an array of length n in constant time) with a one-time O(n) set-up. You can find it documented in chapter 3 (PDF) of "Non-Uniform Random Variate Generation" by Luc Devroye.

The idea is to take your array of probabilities p_k and produce three new n-element arrays, q_k, a_k, and b_k. Each q_k is a probability between 0 and 1, and each a_k and b_k is an integer between 1 and n.

We generate random numbers between 1 and n by generating two random numbers, r and s, between 0 and 1. Let i = floor(r*N)+1. If q_i < s then return a_i else return b_i. The work in the alias method is in figuring out how to produce q_k, a_k and b_k.

edited Sep 12 '12 at 18:12

Peter Mortensen

30,738
21
105
131

answered Feb 17 '11 at 19:13

mhum

2,928
1
16
11

For such a useful algorithm, the Alias Method is surprisingly not very well-known. – mhum Feb 18 '11 at 03:21
For the record: I published a little C library for random sampling using the alias method http://apps.jcns.fz-juelich.de/ransampl. – Joachim W Aug 15 '13 at 16:52
1

[a specific implementation of the alias method may be slower then a method with worse time complexity such as Roulette Wheel](https://bugs.python.org/msg197540) for a given `n` and for a chosen number of random numbers to generate due to constant factors involved in implementing algorithms. – jfs Dec 05 '16 at 06:59
The implementations I've seen require the incoming list to be normalized so that the sum of all weights equals 1. I wonder if it would be reasonable to toss this requirement and instead require integer weights, finding the actual sum as part of the algorithm and then using (My wife is in a hurry so I can't calculate right now, but I think it would be either the sum or the sum / 2) instead of 1 as your "full" point. In particular, I wonder if this would dodge the "numerical inaccuracies" that the Vose implementation corrects for. – Trevortni May 08 '21 at 18:20

hugomg · Answer 2 · 2011-02-17T20:10:09.907

5

Use a balanced binary search tree (or binary search in an array) and get O(log n) complexity. Have one node for each die result and have the keys be the interval that will trigger that result.

function get_result(node, seed):
    if seed < node.interval.start:
        return get_result(node.left_child, seed)
    else if seed < node.interval.end:
        // start <= seed < end
        return node.result
    else:
        return get_result(node.right_child, seed)

The good thing about this solution is that is very simple to implement but still has good complexity.

edited Feb 17 '11 at 20:10

answered Feb 17 '11 at 17:24

hugomg

68,213
24
160
246

Hand-made binary tree like above is simple to implement but it is not guaranteed balanced – yusong Jul 14 '18 at 04:04
You can guarantee that it is balanced if you construct it in the correct order. – hugomg Jul 14 '18 at 16:46

score 3 · Answer 3 · edited Sep 12 '12 at 18:16

I'm thinking of granulating your table.

Instead of having a table with the cumulative for each die value, you could create an integer array of length xN, where x is ideally a high number to increase accuracy of the probability.

Populate this array using the index (normalized by xN) as the cumulative value and, in each 'slot' in the array, store the would-be dice roll if this index comes up.

Maybe I could explain easier with an example:

Using three dice: P(1) = 0.2, P(2) = 0.5, P(3) = 0.3

Create an array, in this case I will choose a simple length, say 10. (that is, x = 3.33333)

arr[0] = 1,
arr[1] = 1,
arr[2] = 2,
arr[3] = 2,
arr[4] = 2,
arr[5] = 2,
arr[6] = 2,
arr[7] = 3,
arr[8] = 3,
arr[9] = 3

Then to get the probability, just randomize a number between 0 and 10 and simply access that index.

This method might loose accuracy, but increase x and accuracy will be sufficient.

For full accuracy you can do the array lookup as a first step, and for array intervals that correspond to multiple sides do a search there. — aaz, Feb 17 '11 at 17:06

Peter O. · Answer 4 · 2022-04-06T18:42:46.290

There are many ways to generate a random integer with a custom distribution (also known as a discrete distribution). The choice depends on many things, including the number of integers to choose from, the shape of the distribution, and whether the distribution will change over time.

One of the simplest ways to choose an integer with a custom weight function f(x) is the rejection sampling method. The following assumes that the highest possible value of f is max and each weight is 0 or greater. The time complexity for rejection sampling is constant on average, but depends greatly on the shape of the distribution and has a worst case of running forever. To choose an integer in [1, k] using rejection sampling:

Choose a uniform random integer i in [1, k].
With probability f(i)/max, return i. Otherwise, go to step 1. (For example, if all the weights are integers greater than 0, choose a uniform random integer in [1, max] and if that number is f(i) or less, return i, or go to step 1 otherwise.)

Other algorithms have an average sampling time that doesn't depend so greatly on the distribution (usually either constant or logarithmic), but often require you to precalculate the weights in a setup step and store them in a data structure. Some of them are also economical in terms of the number of random bits they use on average. Many of these algorithms were introduced after 2011, and they include—

The Bringmann–Larsen succinct data structure ("Succinct Sampling from Discrete Distributions", 2012),
Yunpeng Tang's multi-level search ("An Empirical Study of Random Sampling Methods for Changing Discrete Distributions", 2019), and
the Fast Loaded Dice Roller (2020).

Other algorithms include the alias method (already mentioned in your article), the Knuth–Yao algorithm, the MVN data structure, and more. See my section "Weighted Choice With Replacement" for a survey.

Data structures for loaded dice?

4 Answers4

Linked

Related