7

I was reading the C FAQ and found out in a question that it recommends me to use rand() / (RAND_MAX / N + 1) instead of the more popular way which is rand() % N.

The reasoning for that is that when N is a low number rand() % N will only use a few bits from rand().

I tested the different approaches with N being 2 on both Windows and Linux but could not notice a difference.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define N 2

int main(void)
{
    srand(0);
    printf("rand() %% N:\n");
    for (int i = 0; i < 40; ++i) {
        printf("%d ", rand() % N);
    }
    putchar('\n');

    srand(0);
    printf("rand() / (RAND_MAX / N + 1):\n");
    for (int i = 0; i < 40; ++i) {
        printf("%d ", rand() / (RAND_MAX / N + 1));
    }
    putchar('\n');

    return 0;
}

The output is this (on my gnu/linux machine):

rand() % N:
1 0 1 1 1 1 0 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 1 1 0 0 0 1 1 1 1 0 0 0 1 1 1 0 1 0 
rand() / (RAND_MAX / N + 1):
1 0 1 1 1 0 0 1 0 1 0 1 0 1 1 1 1 1 0 1 0 0 0 1 0 0 0 0 1 0 1 1 1 0 1 1 0 1 0 1 

Both alternatives seem perfectly random to me. It even seems like the second approach is worse than rand % N.

Should I use rand() % N or rand() / (RAND_MAX / N + 1)?

wefwefa3
  • 3,872
  • 2
  • 29
  • 51
  • 7
    That's not how you notice differences in this scenario. Your mistake was this: _"Both alternatives seem perfectly random to me"_. You need an analytical approach. Think of it like this: the only way to truly measure this would be to get some statistics on an infinitely long sequence. – keyser Dec 30 '14 at 13:55
  • This is a poor example with `#define N 2` since `RAND_MAX` is an odd number and `rand() % N` will return an equal distribution of 0 and 1 given the limitations of the pseudo random generator. There is absolutely no difference in the two methods here, only in the sequence of the pseudo random results. But (in general) the larger is `N`, the more significant is the modulo bias. – Weather Vane Dec 30 '14 at 15:00

3 Answers3

6

If N is a power of two, using the remainder technique is usually safe (RAND_MAX is usually a power of two minus 1, so the entire range has a power of two length). More generally, N has to divide the range of rand() in order to avoid the bias.

Otherwise, you run into this problem, regardless of the quality of rand(). In short, the problem is that you're chopping that range into a number of "parts" each of length N, if N does not divide the range then the last part will not be complete. The numbers that got "cut off" from that part are therefore less likely to occur, since they have one fewer "part" they can be generated from.

Unfortunately rand() / (RAND_MAX / N + 1) is also broken (in almost the same way), so the real answer is: don't use either of them.

The problem as outlined above is really fundamental, there is no way to evenly distribute X different values over Y results unless Y divides X. You can fix it by rejecting a part of the random samples, to make Y divide the new X.

harold
  • 61,398
  • 6
  • 86
  • 164
4

There is another problem with rand() % n which is that it introduces a modulo bias.

For simplicity's sake let's pretend RAND_MAX is 7 and n is 6. You want the numbers 0, 1, 2, 3, 4, 5 to appear in the random stream with equal probability. However, 0 and 1 will appear 1/4 of the time and the other numbers only 1/8th of the time because 6 and 7 have remainders 0 and 1 respectively. You should use the other method, but carefully because truncation of fractions might introduce a similar issue.

If you have arc4random(), you can use arc4random_uniform() to achieve an unbiased distribution without having to be careful.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
JeremyP
  • 84,577
  • 15
  • 123
  • 161
  • @David Eisenstat random numbers are not uniform distributions, but a very large sample should statistically approach it. – Weather Vane Dec 30 '14 at 15:03
  • Yes. Truncation of fractions might introduce a similar issue. But that is more evenly spread. The first approach biases towards lower numbers whereas the second approach biases towards numbers that are evenly spread in the range, which might be a better behavior for some applications. – ElKamina Dec 30 '14 at 17:56
0

On avr-gcc:

I was using rand() & 0xFF to get random number from 0 to 255 and the results were not good. Turned out, that using lower bits is not very reliable method, often the same values. Could be similar with modulo.

rand() / (RAND_MAX / N + 1) worked much better for me

Michal Dobrodenka
  • 1,104
  • 8
  • 27
  • If you are worried about randomness and good results: the `man` page on my Mac describes `rand` as "bad random number generator". See [What is a suitable replacement for rand()?](http://stackoverflow.com/questions/7421424/what-is-a-suitable-replacement-for-rand) for alternatives. – Jongware Jun 15 '15 at 14:35