Algorithm for generating random numbers from a discrete distribution?

Question

Design a fast algorithm to repeatedly generate numbers from the discrete distribution: Given an array a[] of non negative real numbers that sum to 1, the goal is to return index i with probability a[i]

I found this question in a an online algorithm book, Introduction to Programming in Java, chapter 4.2: Sorting and Searching (http://introcs.cs.princeton.edu/java/42sort/) .

the hint says:

Form an array s[] of cumulated sums such that s[i] is the sum of the first i elements of a[]. Now, generate a random real number r between 0 and 1, and use binary search to return the index i for which s[i] ≤ s[i+1].

some how I am not able to understand the hint and hence cant find the solution..

What have you tried so far that isn't working? Please post your code and an explanation of how it's not working as you'd expect, and someone here will be happy to help you figure out how to fix it. We don't just do your work for you, though - you do need to do some work to try and figure it out yourself first. :) — Ken White, Apr 08 '12 at 16:28
possible duplicate of [Data structure for loaded dice?](http://stackoverflow.com/questions/5027757/data-structure-for-loaded-dice) — templatetypedef, Apr 08 '12 at 16:35
Uhmm, since `a[]` is non-negative, `s[i] <= s[i+1]` is true for all `i`. The hint seems wrong. I think it means to say you have to return the first `i` such that `s[i] >= r`. — IVlad, Apr 08 '12 at 16:36

score 8 · Answer 1 · answered Apr 08 '12 at 16:31

There are many ways to answer this problem. This article describes numerous approaches, their strengths, their weaknesses, and their runtimes. It concludes with an algorithm that takes O(n) preprocessing time, then generates numbers in time O(1) each.

The particular approach you're looking for is described under "roulette wheel selection."

Hope this helps!

score 2 · Answer 2 · answered Apr 08 '12 at 19:45

Here's a Python algorithm which implements the 'Roulette Wheel' Technique. It's tough to explain without a graphic. The article linked to by templatetypedef should do well for that. Also, note that this algorithm doesn't actually require the weights to be normalized (they don't need to sum to 1), but this will work nonetheless.

import random

trials = 50
selected_indices = []

# weights on each index
distrib = [0.1, 0.4, 0.2, 0.3]

index = random.randrange(0, len(distrib) - 1)
max_weight = max(distrib)
B = 0
# generate 'trials' random indices
for i in range (trials):

    # increase B by a factor which is
    # guaranteed to be much larger than our largest weight
    B = B + random.uniform(0, 2 * max_weight)

    # continue stepping through wheel until B lands 'within' a weight
    while(B > distrib[index]):
        B = B - distrib[index]
        index = (index + 1) % len(distrib)
    selected_indices.append(index)

print("Randomly selected indices from {0} trials".format(trials))
print(selected_indices)

wildplasser · Answer 3 · 2012-04-08T16:54:51.750

This is a snippet from wakkerbot/megahal. Here the weights are (unsigned) ints, and their sum is in node->childsum. For maximum speed, the children are sorted (more or less) in descending order. (the weights are expected to have a power-law like distribution, with only a few high weights and many smaller ones)

    /*
     *          Choose a symbol at random from this context.
     *          weighted by ->thevalue
     */
    credit = urnd( node->childsum );
    for(cidx=0; 1; cidx = (cidx+1) % node->branch) {
        symbol = node->children[cidx].ptr->symbol;
        if (credit < node->children[cidx].ptr->thevalue) break;
        /* 20120203 if (node->children[cidx].ptr->thevalue == 0) credit--; */
        credit -= node->children[cidx].ptr->thevalue;
    }
done:
    // fprintf(stderr, "{+%u}", symbol );
    return symbol;

score 0 · Answer 4 · answered Apr 08 '12 at 17:35

Depending on the granularity, you can create an Index with 100, 1000 or 10000 Elements. Assume a distribution (a,b,c,d) with p=(10%, 20%, 30%, 40%), we create a Map:

val prob = Map ('a' -> 10, 'b' -> 20, 'c' -> 30, 'd' -> 40) 
val index = (for (e <- prob;
  i <- (1 to e._2)) yield e._1 ).toList 

index: List[Char] = List(a, a, a, a, a, a, a, a, a, a, 
b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, 
c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, 
d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d)

We can now pick an element of desired probability very fast:

val x = index (random.nextInt (100))

x is now by 40% d, by 10% a and so on. Short setup, fast access.

The numbers don't even need to sum up to 100, but you have to calculate the range once, then:

val max = prob.map (_._2).sum 
val x = index (random.nextInt (max))

Algorithm for generating random numbers from a discrete distribution?

4 Answers4

Linked