How to test bias
The rand()
generator on your system (the one in glibc) has problems, but excessive bias is not among them. Let's assume that you use the following code to generate random numbers within a given range.
int random_int(int min, int max)
{
return min + rand() % (max - min + 1);
}
Let's not assume you seed the numbers.
int main(int argc, char **argv)
{
int histo[10];
for (int i = 0; i < 10; i++)
histo[i] = 0;
for (int i = 0; i < 10000; i++)
histo[random_int(1, 10) - 1]++;
for (int i = 0; i < 10; i++)
printf("%d\n", histo[i]);
}
This will give us 10,000 samples, which is small but workable. I get the following results. If you are using the same version of glibc, you will get the exact same.
1053
980
1002
959
1009
948
1036
1041
987
985
We expect bins to follow the binomial distribution, given an unbiased generator. For 10,000 samples, we expect the per-bin variance to be Np(1-p) or 900, which gives a standard deviation of exactly 30. Our sample variance is 1105. Now, I'm not going to do anything rigorous here... I'm going to pretend that the binomial distributions are normal... and I'm just going to do a simple chi-square test. The results are p=0.2. Not exactly damning.
So if you want to test your random number generator, remember to do the math afterwards to interpret the results of your test.
Modulo bias
The modulo bias actually increases the probability of lower numbers, not higher numbers. The bias is very small for such ranges (1..10) because RAND_MAX
is 231-1 for glibc, and this gives increases the probability of small numbers by something like 1 in 200 million. You would need to perform a larger number of tests to expose modulo bias.
The main reason that modulo is discouraged is because the low bits of common rand()
implementations show poor independence. Of course, you also shouldn't use this technique to generate large ranges.
Recommendations
If you really want to test your random number generator, I suggest looking at the late Marsaglia's "Diehard" tests. If you just want a good random number generator, you can use arc4random, Mersenne Twister, or /dev/urandom
. Your choice will differ depending on whether you are developing a cryptographic application or using the results for Monte Carlo simulation.