3

I need to generate unbiased uniformly distributed integers in an arbitrarily large range, using random bits which can be expensive to obtain, therefore I want to use as few random bits as possible on average.

The generated numbers should be between 0 and N-1, where N is the given range.

What I am doing now is:

  1. Calculate the number of bits B in N by some means; mathematically, B = ceil(log(N)/log(2)).
  2. Generate B random bits.
  3. If the random bits generated form an integer less than N, return them. Otherwise go to step 2.

The best case is when N is a power of two; then you don't reject anything. But the worst case is when N is a power of two plus one; in that case, you get asymptotically close to a probability of rejection of 1/2 on every attempt, as B grows. That strikes me as wasteful, so I wondered if there are better algorithms that need fewer bits on average.

For example, when the generated bits are >= N, if I permute bit positions in a predefined order instead of rejecting them, with the hope of finding one such permutation that is within range, and only generate a new batch if none of the permutations succeeds, would the result be uniformly distributed?

Pedro Gimeno
  • 2,837
  • 1
  • 25
  • 33
  • 1
    It would be useful to know: how many random numbers you need? what is the range of N you are interested? what magnitude of bias is maybe allowed? how expensive is 1bit? are the memory or runtime constraints? – Unlikus May 15 '23 at 11:00
  • 1
    You might find useful answers here: https://stackoverflow.com/q/6046918/270986 – Mark Dickinson May 15 '23 at 17:11
  • Note, it is possible to use a random entropy source to seed a generator that uses various techniques to generate more bits than it has entropy for. The result is better than pseudorandom while being worse than truly random. Many operating systems offer access to implementations. See https://en.wikipedia.org/wiki//dev/random for more. – btilly May 15 '23 at 21:44

2 Answers2

2

Here's a method based on arithmetic coding:

  1. Start with a range [0,N)

  2. Generate random bits and use them to select either the top or bottom half of the range. e.g. [0,5) => [5/2, 5)

  3. When the whole range fits within one integer step, then stop and return that integer.

Note that you will have to work with fractions.

Using this method, once the range gets smaller than 1, the probability of having to use another bit is always <= 50%, and when you do, it's only one bit. The expected number of bits required is at most ceil(log(N))+1.

E.g.: [0,5) -> [0,5/2) -> [5/4, 10/4) => [15/8, 20/8) => [35/16,40/16)

We got unlucky when we picked [15/8, 20/8). The alternative was [10/8, 15/8), which is covered by [1,2).

The next bit fixed the problem, though. since 2 <= 35/16 < 40/16 <= 3, 2 is our answer

Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87
  • 1
    You have to work with fractions, but the fraction is always some integer over a power of 2. And the power of 2 is the number of bits you've generated. So it isn't hard to write this using only integers. – btilly May 15 '23 at 17:25
1

No, your permuting strategy is biased. Imagine you permute the first and the last digit. Now your numbers are more likely to be odd than even. I am not 100% sure that there are no permutation strategies which could work, but I don't think so.

Assuming you need more than one random number you can optimize this.

Generate first a number between 0 and N^k-1, and choose k such that N^k is near but smaller than a power of two. You can than extract k numbers from 0 to N. For example N = 17: you can choose k=11 17^11 = 34271896307633 < 35184372088832 = 2^45

This strategy does not scale well with N, for N = 2^20+1 you already need k approximately 700,000

Unlikus
  • 1,419
  • 10
  • 24