1

Algo (Source: Elements of Programming Interviews, 5.16)

You are given n numbers as well as probabilities p0, p1,.., pn-1 which sum up to 1. Given a rand num generator that produces values in [0,1] uniformly, how would you generate one of the n numbers according to their specific probabilities.

Example

If numbers are 3, 5, 7, 11, and the probabilities are 9/18, 6/18, 2/18, 1/18, then in 1000000 cals to the program, 3 should appear 500000 times, 7 should appear 111111 times, etc.

The book says to create intervals p0, p0 + p1, p0 + p1 + p2, etc so in the example above the intervals are [0.0, 5.0), [0.5, 0.0.8333), etc and combining these intervals into a sorted array of endpoints could look something like [1/18, 3/18, 9/18, 18/18]. Then run the random function generator, and find the smallest element that is larger than the generated element - the array index that it corresponds to maps to an index in the given n numbers.

This would require O(N) pre-processing time and then O(log N) to binary search for the value.

I have an alternate solution that requires O(N) pre-processing time and O(1) execution time, and am wondering what may be wrong with it.

Why can't we iterate through each number in n, multiplying [n] * 100 * probability that matches with n. E.g [3] * (9/18) * 100. Concatenate all these arrays to get, at the end, a list of 100 elements, with the number of elements for each mapping to how likely it is to occur. Then, run the random num function and index into the array, and return the value.

Wouldn't this be more efficient than the provided solution?

Prune
  • 76,765
  • 14
  • 60
  • 81
segue_segway
  • 1,490
  • 1
  • 22
  • 40
  • What about `[1/e, 1/pi, 1/(1-e-p)]`? – Kijewski May 11 '18 at 16:25
  • What if the probability of drawing a "4" isn't 13%, but 13.241235123%? Instead of generating thirteen "4"s in your list, now you have to generate 13241235123 "4"s and obviously an increased number of other possible draw values too. – n3utrino May 11 '18 at 16:26
  • Your approach only produces accurate results if the individual probabilities are integer percentages. You could get perfect results for the example problem by choosing an array size of 18 (or multiple thereof), but in general there's no reason to expect the probabilities to even be rational numbers. – jasonharper May 11 '18 at 16:27

2 Answers2

2

Your number 100 is not independent of the input; it depends on the given p values. Any parameter that depends on the magnitude of the input values is really exponential in the input size, meaning you are actually using exponential space. Just constructing that array would thus take exponential time, even if it was structured to allow constant lookup time after generating the random number.

Consider two p values, 0.01 and 0.99. 100 values is sufficient to implement your scheme. Now consider 0.001 and 0.999. Now you need an array of 1,000 values to model the probability distribution. The amount of space grows with (I believe) the ratio of the largest p value and the smallest, not in the number of p values given.

chepner
  • 497,756
  • 71
  • 530
  • 681
2

If you have rational probabilities, you can make that work. Rather than 100, you must use a common denominator of the rational proportions. Insisting on 100 items will not fulfill the specs of your assigned example, let alone more diabolical ones.

Prune
  • 76,765
  • 14
  • 60
  • 81