Why is there a probability of higher numbers using rand()

Question

Weird question I guess. It's out of curiosity.

Using rand() function, if we set the parameters between 1-10, i then ran a test a few times on my machines UNIX operating system, more specifically Ubuntu. My results always showed higher numbers (greater then 5) being more likely returned. It didn't seem at all as if it was random.

I also read up on the modulus which states that using the modulus operation, we form some kind of bias.

Notice though that this modulo operation does not generate uniformly distributed random numbers in the span (since in most cases this operation makes lower numbers slightly more likely).

Why is that? Also it said lower numbers become more likely, however I get more higher numbers

You cannot set the range of the `rand()` function, show your code. — Dietrich Epp, Dec 28 '14 at 22:24
From Mr. STL himself: [rand() Considered Harmful](http://channel9.msdn.com/Events/GoingNative/2013/rand-Considered-Harmful) — bolov, Dec 28 '14 at 22:24
@πάντα ῥεῖ: That's not a duplicate of this question. — Dietrich Epp, Dec 28 '14 at 22:26
@DietrichEpp Find a better one, I'm pretty sure it exists. As is, the question is just crap, needs to be sorted out. — πάντα ῥεῖ, Dec 28 '14 at 22:27
@πάντα ῥεῖ: You're the one who closed... it was your responsibility to mark the correct duplicate. I think this question stems from a misunderstanding of how to correctly test the uniformity of random number generators (i.e. how to perform statistical tests). — Dietrich Epp, Dec 28 '14 at 22:30
You have two different questions. One is about the behavior of your test case; we can't answer that without seeing your code (and thereby explaining what you mean by "set the parameters between 1-10"). The second is about the use of the `%` modulo operator; that's discussed in question 13.16 of the [comp.lang.c FAQ](http://www.c-faq.com/). — Keith Thompson, Dec 28 '14 at 22:31
@πάντα ῥεῖ: I've reopened the question. I agree that it needs some work, as I commented above. Feel free to vote to close it for that reason, but I don't think it's a duplicate. (I prefer to give the OP a little time to improve the question, but YMMV.) — Keith Thompson, Dec 28 '14 at 22:33
[This](http://stackoverflow.com/questions/10984974/why-do-people-say-there-is-modulo-bias-when-using-a-random-number-generator?rq=1) seems to be an exhausting discussion on modulo bias. — PM 77-1, Dec 28 '14 at 22:38
[Another post](http://stackoverflow.com/questions/10219355/does-n-rand-rand-max-make-a-skewed-random-number-distribution) covering the topic. — PM 77-1, Dec 28 '14 at 22:42
One more write-up: [Misconceptions about rand()](http://www.azillionmonkeys.com/qed/random.html). — PM 77-1, Dec 28 '14 at 22:44
I've voted to close this question (referring to the part that's about the behavior of the OP's test case) because the test case is not included in the question. Please update your question with that information. — Keith Thompson, Dec 28 '14 at 22:45

score 3 · Answer 1 · answered Dec 28 '14 at 23:05

How to test bias

The rand() generator on your system (the one in glibc) has problems, but excessive bias is not among them. Let's assume that you use the following code to generate random numbers within a given range.

int random_int(int min, int max)
{
    return min + rand() % (max - min + 1);
}

Let's not assume you seed the numbers.

int main(int argc, char **argv)
{   
    int histo[10];
    for (int i = 0; i < 10; i++) 
        histo[i] = 0;
    for (int i = 0; i < 10000; i++) 
        histo[random_int(1, 10) - 1]++;
    for (int i = 0; i < 10; i++)
        printf("%d\n", histo[i]);
}

This will give us 10,000 samples, which is small but workable. I get the following results. If you are using the same version of glibc, you will get the exact same.

We expect bins to follow the binomial distribution, given an unbiased generator. For 10,000 samples, we expect the per-bin variance to be Np(1-p) or 900, which gives a standard deviation of exactly 30. Our sample variance is 1105. Now, I'm not going to do anything rigorous here... I'm going to pretend that the binomial distributions are normal... and I'm just going to do a simple chi-square test. The results are p=0.2. Not exactly damning.

So if you want to test your random number generator, remember to do the math afterwards to interpret the results of your test.

Modulo bias

The modulo bias actually increases the probability of lower numbers, not higher numbers. The bias is very small for such ranges (1..10) because RAND_MAX is 2³¹-1 for glibc, and this gives increases the probability of small numbers by something like 1 in 200 million. You would need to perform a larger number of tests to expose modulo bias.

The main reason that modulo is discouraged is because the low bits of common rand() implementations show poor independence. Of course, you also shouldn't use this technique to generate large ranges.

Recommendations

If you really want to test your random number generator, I suggest looking at the late Marsaglia's "Diehard" tests. If you just want a good random number generator, you can use arc4random, Mersenne Twister, or /dev/urandom. Your choice will differ depending on whether you are developing a cryptographic application or using the results for Monte Carlo simulation.

Why is there a probability of higher numbers using rand()

1 Answers1

How to test bias

Modulo bias

Recommendations