3

In my embedded project, I have a biginteger class that handles arbitrary length integers. I would like to be able to generate a random bigint between 0 and an arbitrary number. Assume I have a quality source of random bytes.

All the implementations I have seen essentially do the same thing:

  1. Generate a big number with the correct number of bytes,
  2. If it is greater than max, generate again.

The problem I see with this implementation is that it could take an awfully long time. Imagine that max = 2^2049-1 =(01 FF .. FF). This algorithm will generate 257 bytes, then check that the most significant byte is <=1. So there is a 254/256 chance it has to generate a whole new 257 byte number. In the (admittedly unlikely) worst case, this loop could go on for minutes or years.

My question is:
In the case where the generated number is too large, is there a way to keep most of the bytes I have already generated?
Is it valid to just regenerate the most significant byte, or does that introduce bias? What about shifting the result right one digit?

Is there any way to make the time deterministic, while still avoiding bias?

--

Another edge case: max = 2^2048 + 1 = (01 00 .. 01) In this case the most significant byte can be non-zero if the remaining bytes are 0s followed by a 00 or 01. So most of the time, if the MSB is non-zero, than it will be invalid, and just regenerating that byte will never make it valid. But just force setting it to zero seems wrong also.

AShelly
  • 34,686
  • 15
  • 91
  • 152
  • 1
    I don't see how regenerating only a part of the bytes could introduce a bias. You are just replacing some random bytes with other random bytes. In essence your problem boils down to choosing n random numbers between 0 and 255, and one random number with a lower range. Seems like a easy and obvious solution to me. Right-shifting on the other hand would introduce a bias, in your example you never right-shift numbers with 00 or 01 – HugoRune Nov 16 '15 at 20:39

3 Answers3

2

The answer is that it is impossible in general to generate a random unbiased integer in [0, n) in constant time. One notable exception is when the source of random numbers produces unbiased random bits and n is a power of 2.

For instance, assume we have a "true" random generator and can produce unbiased random bits. Then, unless n is a power of 2, there are only two possible ways to proceed:

  • It can use modulo reduction (or Lemire's multiply-then-shift reduction). This will run in constant time, but introduce a bias (some numbers are slightly more likely to be generated than others).
  • It can use rejection sampling. This will introduce no bias, but can run forever in the worst case (even though it has an expected constant time complexity). Many kinds of algorithms fit in this category, including modulo reduction followed by a rejection step (which is necessary if n is not a power of 2), as well as the Fast Dice Roller (which uses random bits).

(See my note on integer generating algorithms for a survey of both kinds of algorithms. For a Fast Dice Roller implementation, see another answer of mine.)

In this sense, Knuth and Yao showed in 1976 that any algorithm that produces random integers with a given probability, using only random bits, can be represented as a binary tree, where random bits indicate which way to traverse the tree and each leaf (endpoint) corresponds to an outcome. (Knuth and Yao, "The complexity of nonuniform random number generation", in Algorithms and Complexity, 1976.) In this case, each integer in [0, n) can occur with probability 1/n. And if 1/n has a non-terminating binary expansion (which will be the case if n is not a power of 2), this binary tree will necessarily either—

  • have an "infinite" depth, or
  • include "rejection" leaves at the end of the tree,

And in either case, the algorithm won't run in constant time.

Modulo or similar reductions are equivalent to a binary tree in which rejection leaves are replaced with labeled outcomes — but since there are more possible outcomes than rejection leaves, only some of the outcomes can take the place of the rejection leaves, introducing bias. The same kind of binary tree — and the same kind of bias — results if you stop rejecting after a set number of iterations. (See also chapter 15 of Non-Uniform Random Variate Generation by L. Devroye, 1986.)

Therefore: In general, an integer generator can be either unbiased or constant-time, but not both.

If you can't tolerate the worst case of running forever, then the only thing you can do is set a fixed maximum number of rejections or use a reduction, both of which can introduce bias. However, this bias might be negligible depending on your application (e.g., if the chance the algorithm "fails" is negligible compared to the chance it "succeeds", for the application's purposes). There are also security aspects to random integer generation, which are too complicated to discuss in this answer.

Peter O.
  • 32,158
  • 14
  • 82
  • 96
0

If your arbitrary maximum number is a power of two minus one, then a source of random bits, e.g., a coin toss, can be used to fill in the bits. This gives a number with uniform distribution. You can use a high quality RNG to generate the bits in groups of 32 or 64, and truncate the last word without bias.

Now, if your arbitrary maximum number is not a power of two minus one, use the above technique to create a uniform fraction over the range 0..1. The more bits you use for the fraction, the less bias you will have in the result.

For example, calling your arbitrary maximum number M, choose an n so that

2^n >> M /* 2^n is much greater than M */

Now, your random number is

M * (rand(2^n) / 2^n)

where rand is the procedure described in the first paragraph above.

Doug Currie
  • 40,708
  • 1
  • 95
  • 119
  • The first paragraph of this answer is absolutely correct. The rest is wrong, because it doesn't *eliminate* the bias. It's also impractical when you consider that `M` is going to be a very large number. – Mark Ransom Nov 16 '15 at 21:57
  • Re: eliminating bias, by increasing `n` I believe you can reduce the bias to below that of the source RNG, in any case, arbitrarily small. Re: practicality, that can only be tested if there is a stated requirement. The only requirement was time determinism, which this algorithm certainly meets. The code is dealing with size M numbers anyway; an extra multiply and shift may not be that significant! – Doug Currie Nov 16 '15 at 22:46
0

A random number generator creates random numbers with an integral number of bits. If the number is truly statistically random then each bit is independent of the others, and you can use or throw away any combination of them. For your example, you can simply throw away 7 bits and have a non-biased number.

For ranges that are not a power of 2, you can factor the size of the range and get a random number for each one, then combine them. If we assume a function randint(n) that delivers an unbiased random number between 0 and n-1, the general formula is:

(((randint(A) * B + randint(B)) * C + randint(C)) * D + randint(D)) ...

For example if your range was 0-10^616-1, you could factor that into 5^616*2^616.

rand_10_616 = randint(5^616) * 2^616 + randint(2^616)

Obviously you still have a problem getting an unbiased result for 5^616, but it's a smaller problem to solve.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • The ranges are not guaranteed to be powers of 2. And If I could [factor bigints efficiently](https://en.wikipedia.org/wiki/Integer_factorization), wouldn't that mean I have cracked RSA? – AShelly Nov 18 '15 at 19:18
  • @AShelly yes indeed it would. Since you never specified how you obtained your random number limit, it wasn't possible to know if this answer was relevant or not. – Mark Ransom Nov 18 '15 at 19:54