4

Assuming I can generate random bytes of data, how can I use that to choose an element out of an array of n elements?

If I have 256 elements I can generate 1 byte of entropy (8 bits), and then use that to pick my element simply be converting it to an integer.

If I have 2 elements I can generate 1 byte, discard 7 bits and use the remaining bit to select my element.

But what if I have 3 elements? 1 bit is too few and 2 is too many. How would I randomly select 1 of the 3 elements with equal probability?

mpen
  • 272,448
  • 266
  • 850
  • 1,236
  • You use 2 bits. In case of 11 you repeat. – user515430 Oct 28 '14 at 16:12
  • @user515430 Is that the best way? If I have 1025 elements I have to use 11 bits and I have nearly a 50% chance of having to re-pick. – mpen Oct 28 '14 at 16:15
  • Which means in average you only have to repeat a single time. But yes, you can do better. It is just more involved. Suppose you want to select one of five. Instead of using 3 bits, you could use 4 bits and only have to repeat one in sixteen tries. – user515430 Oct 28 '14 at 18:38
  • @user515430 Wouldn't using more bits *increase* the odds of a miss? – mpen Oct 28 '14 at 19:19
  • 1
    No. With four bits, 0000, 0001, 0010 means first choice, 0011, 0100, 0101 means the second choice, etc. Only 1111 requires a repeat. – user515430 Oct 28 '14 at 21:21
  • Good exposition here: https://stackoverflow.com/questions/11758809/what-is-the-optimal-algorithm-for-generating-an-unbiased-random-integer-within-a – danh Feb 14 '23 at 23:49

2 Answers2

4

Here is a survey of algorithms to generate uniform random integers from random bits.

Some of these algorithms are "constant-time", others are unbiased, and still others are "optimal" in terms of the number of random bits it uses on average, assuming we have a "true" random generator that can produce unbiased and independent random bits.

For further discussion, see the following answer of mine:

Peter O.
  • 32,158
  • 14
  • 82
  • 96
  • See "Daniel Lemire. 2019. Fast Random Integer Generation in an Interval. ACM Trans. Model. Comput. Simul. 29, 1, Article 3 (February 2019), 12 pages. DOI:https://doi.org/10.1145/3230636" for an update on your third reference. This is "nearly divisionless," and his benchmarks show that it significantly outperforms the widely-used algorithms from Java and OpenBSD. – pjs Aug 31 '23 at 20:23
1

You can generate the proper distribution by simply truncating into the necessary range. If you have N elements then simply generate ceiling(log(N))=K random bits. Doing so is inefficient, but still works as long as the K bits are generated randomly.

In your example where you have N=3, you need at least K=2 bits, you have the following outcomes [00, 01, 10, 11] of equal probability. To map this into the proper range, just ignore one of the outcomes, such as the last one. Think of this as creating a new joint probability distribution, p(x_1, x_2), over the two bits where p(x_1=1, x_2=1) = 0, while for each of the others it will be 1/3 due to renormalization (i.e., (1/4)/(3/4) = 1/3 ).

user119961
  • 11
  • 1
  • I'm not sure I follow. What do you mean by "just ignore one out of the outcomes"? Do you mean I should generate some new bits and try again or what? – mpen Oct 28 '14 at 16:57
  • 1
    His method is called "rejection sampling", and yes, it's the only way to get truly unbiased random distributions. – Lee Daniel Crocker Oct 28 '14 at 20:24