Modulo bias is a problem that arises when naively using the modulo operation to get pseudorandom numbers smaller than a given "upper bound".
Therefore as a C programmer I am using a modified version of the arc4random_uniform()
function to generate evenly distributed pseudorandom numbers.
The problem is I do not understand how the function works, mathematically.
This is the function's explanatory comment, followed by a link to the full source code:
/*
* Calculate a uniformly distributed random number less than upper_bound
* avoiding "modulo bias".
*
* Uniformity is achieved by generating new random numbers until the one
* returned is outside the range [0, 2**32 % upper_bound). This
* guarantees the selected random number will be inside
* [2**32 % upper_bound, 2**32) which maps back to [0, upper_bound)
* after reduction modulo upper_bound.
*/
From the comment above we can define:
[2^32 % upper_bound, 2^32)
- interval A[0, upper_bound)
- interval B
In order to work, the function relies on the fact that interval A maps to interval B.
My question is: mathematically, how come the numbers in interval A map uniformly to the ones in interval B? And is there a proof for this?