5

If I have a true random number generator (TRNG) which can give me either a 0 or a 1 each time I call it, then it is trivial to then generate any number in a range with a length equal to a power of 2. For example, if I wanted to generate a random number between 0 and 63, I would simply poll the TRNG 5 times, for a maximum value of 11111 and a minimum value of 00000. The problem is when I want a number in a rangle not equal to 2^n. Say I wanted to simulate the roll of a dice. I would need a range between 1 and 6, with equal weighting. Clearly, I would need three bits to store the result, but polling the TRNG 3 times would introduce two eroneous values. We could simply ignore them, but then that would give one side of the dice a much lower odds of being rolled.

My question of ome most effectively deals with this.

krfkeith
  • 193
  • 6
  • 1
    Why would ignoring the invalid results change the weighting for one die side? Generate three random bits, producing any of 0,1,2,3,4,5,6,7 with equal probability. If 0 or 7, discard and try again. – Chowlett Aug 07 '13 at 11:20
  • Wouldn't polling until you got a valid number be enough? – Qiau Aug 07 '13 at 11:20
  • Does this answer your question? [How to generate a random integer in the range \[0,n\] from a stream of random bits without wasting bits?](https://stackoverflow.com/questions/6046918/how-to-generate-a-random-integer-in-the-range-0-n-from-a-stream-of-random-bits) – Peter O. Feb 14 '23 at 23:16

3 Answers3

3

The easiest way to get a perfectly accurate result is by rejection sampling. For example, generate a random value from 1 to 8 (3 bits), rejecting and generating a new value (3 new bits) whenever you get a 7 or 8. Do this in a loop.

You can get arbitrarily close to accurate just by generating a large number of bits, doing the mod 6, and living with the bias. In cases like 32-bit values mod 6, the bias will be so small that it will be almost impossible to detect, even after simulating millions of rolls.

Lee Daniel Crocker
  • 12,927
  • 3
  • 29
  • 55
  • 1
    You'd be surprised how hard it is to detect the bias. I've written lots of statistical testing code for RNGs. If you only need a million dice rolls or so, a 32-bit RNG will not be *detectably* biased. For billions, it can be. That's why most people just blindly use the floating-point function of their generator--which is no less biased than the integer one--and never notice. – Lee Daniel Crocker Aug 07 '13 at 18:09
  • However, if you generate 32 bits and then take mod 6 in order to generate a number in the range 0..5, you generate way more bits than the rejection method and require division. Rejection method requires expected four bits or so. – Antti Huima Aug 08 '13 at 09:01
  • Yep, that's I use rejection sampling in my own library, with no division. – Lee Daniel Crocker Aug 08 '13 at 09:15
2

If you want a number in range 0 .. R - 1, pick least n such that R is less or equal to 2n. Then generate a random number r in the range 0 .. 2n-1 using your method. If it is greater or equal to R, discard it and generate again. The probability that your generation fails in this manner is at most 1/2, you will get a number in your desired range with less than two attempts on the average. This method is balanced and does not impair the randomness of the result in any fashion.

Antti Huima
  • 25,136
  • 3
  • 52
  • 71
0

As you've observed, you can repeatedly double the range of a possible random values through powers of two by concatenating bits, but if you start with an integer number of bits (like zero) then you cannot obtain any range with prime factors other than two.

There are several ways out; none of which are ideal:

  1. Simply produce the first reachable range which is larger than what you need, and to discard results and start again if the random value falls outside the desired range.
  2. Produce a very large range, and distribute that as evenly as possible amongst your desired outputs, and overlook the small bias that you get.
  3. Produce a very large range, distribute what you can evenly amongst your desired outputs, and if you hit upon one of the [proportionally] few values which fall outside of the set which distributes evenly, then discard the result and start again.
  4. As with 3, but recycle the parts of the value that you did not convert into a result.

The first option isn't always a good idea. Numbers 2 and 3 are pretty common. If your random bits are cheap then 3 is normally the fastest solution with a fairly small chance of repeating often.

For the last one; supposing that you have built a random value r in [0,31], and from that you need to produce a result x [0,5]. Values of r in [0,29] could be mapped to the required output without any bias using mod 6, while values [30,31] would have to be dropped on the floor to avoid bias.

In the former case, you produce a valid result x, but there's some more randomness left over -- the difference between the ranges [0,5], [6,11], etc., (five possible values in this case). You can use this to start building your new r for the next random value you'll need to produce.

In the latter case, you don't get any x and are going to have to try again, but you don't have to throw away all of r. The specific value picked from the illegal range [30,31] is left-over and free to be used as a starting value for your next r (two possible values).

The random range you have from that point on needn't be a power of two. That doesn't mean it'll magically reach the range you need at the time, but it does mean you can minimise what you throw away.

The larger you make r, the more bits you may need to throw away if it overflows, but the smaller the chances of that happening. Adding one bit halves your risk but increases the cost only linearly, so it's best to use the largest r you can handle.

sh1
  • 4,324
  • 17
  • 30
  • I like your set of options, but not sure why you call option one a bad idea. It's perfectly accurate, and can be quite efficient in both speed and bits of randomness used if done well. In the given case (range of 6), it goes through the loop 1.33 times on average, thus using about 4 bits per value. – Lee Daniel Crocker Aug 08 '13 at 01:00
  • The problem with option 1, in contrast with option 3, is that it maximises the chances of failure, and I don't think there's ever a good reason to do that. If random bits are cheap then you should use them to maximise your chance of success on the first attempt. This is how Java does it. If they're expensive then you should recycle; which I've demonstrated [here](http://stackoverflow.com/a/17749722/2417578). That algorithm is for speed, but if you put in bit counters you'll see that it's an almost perfectly efficient consumer. – sh1 Aug 08 '13 at 08:17
  • There's also a fifth way -- another approximate method with vanishingly small error but predictable execution time and efficient use of entropy -- but I haven't managed to bundle it up into a clear and tangible algorithm just yet. – sh1 Aug 08 '13 at 08:20
  • I wouldn't call that "failure". Yes, you're maximizing the variability in execution time, but that's really not an issue for most applications where it's average time that's most important. "Recycling" is cool, but slow, so I only recommend that if bits are so expensive that it's worth the extra work. Option 3 is a good compromise too, and lots of libraries do that. But all the options except #1 require a division operation, so #1 will always have the fastest average time, even in the worst case of 50% rejection, on machines where division is expensive. – Lee Daniel Crocker Aug 08 '13 at 09:13
  • Division isn't _that_ expensive. If the range is a constant it can be turned into a fixed-point multiply, and if it's not then you have to figure out how many bits you need. Between that and the high risk of branch misprediction and the cost of either retaining the unused bits or generating the discarded ones, I can't see it offering a reliable performance advantage. – sh1 Aug 08 '13 at 09:34
  • Yep, you do need lots of branches for #1, and that can indeed be more of a performance hit that division on some machines. – Lee Daniel Crocker Aug 08 '13 at 09:36