19

One common way of choosing a random number in [0, n) is to take the result of rand() modulo n: rand() % n. However, even if the results returned by the available rand() implementation are fully uniform, shouldn't there be a problem with the uniformity of the resulting [0, n) numbers when RAND_MAX + 1 does not divide evenly by n? E.g. suppose RAND_MAX is 2, and n is 2. Then out of 3 possible rand() outputs: 0, 1 and 2, we get 0, 1 and 0 respectively when we use them modulo n. Therefore the output will not be uniform at all.

Is this a real problem in practice? What is a better way of choosing random numbers in [0, n) uniformly deriving from rand() output, preferably without any floating point arithmetic?

dragonroot
  • 5,653
  • 3
  • 38
  • 63
  • 1
    possible duplicate of [What is the optimal algorithm for generating an unbiased random integer within a range?](http://stackoverflow.com/questions/11758809/what-is-the-optimal-algorithm-for-generating-an-unbiased-random-integer-within-a) – hammar Oct 27 '12 at 21:54
  • Not quite a duplicate, since the bias issue is taken for granted and the question is "is this really a problem in practice?" I've tried to quantify the bias in my answer. – slashingweapon Oct 27 '12 at 21:58
  • See: http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx – Alex Reynolds Oct 27 '12 at 22:09
  • @MuriloVasconcelos: yes, I've just accepted one. – dragonroot Oct 29 '12 at 07:07
  • Possible duplicate of [Why do people say there is modulo bias when using a random number generator?](http://stackoverflow.com/questions/10984974/why-do-people-say-there-is-modulo-bias-when-using-a-random-number-generator) – Emil Laine Oct 01 '16 at 09:34

4 Answers4

10

You are correct, rand() % N is not precisely uniformly distributed. Precisely how much that matters depends on the range of numbers you want and the degree of randomness you want, but if you want enough randomness that you'd even care about it you don't want to use rand() anyway. Get a real random number generator.

That said, to get a real random distribution, mod to the next power of 2 and sample until you get one in the range you want (e.g. for 0-9, use while(n = rand()%0x10 > 10);).

Kevin
  • 53,822
  • 15
  • 101
  • 132
  • +1 for the workaround, but usually the low bits of `rand()` have very bad entropy. It would be smarter to use the high bits. – R.. GitHub STOP HELPING ICE Oct 27 '12 at 22:21
  • Why would you not consider `rand()` a real RNG? – dragonroot Oct 27 '12 at 22:51
  • @dragonroot It's just not a very good (high-entropy) RNG. As R says, the low bits are particularly bad, but the whole thing is not as random as it could be. It's If you want better randomness, use something like the [Gnu Scientific Library (GSL)](http://www.gnu.org/software/gsl/). – Kevin Oct 27 '12 at 23:02
  • 2
    @Kevin: Are you judging any particular implementation of rand(), i.e. the one found in modern glibc? – dragonroot Oct 27 '12 at 23:44
  • @Kevin — Actually, `rand() % N` is perfectly uniformly distributed when `N` is a power of 2, provided that `rand()` is also uniformly distributed, which it had darn well better be. So it's not entirely correct to say that `rand() % N` is not uniformly distributed, because it depends on `N`. – Todd Lehman Jul 30 '15 at 03:55
  • @R.. — That might have been true 20 years ago. Not true today. – Todd Lehman Jul 30 '15 at 03:57
  • @ToddLehman: Citation needed. I suspect if you do a survey of `rand` implementations you'll find LCG is the most common. glibc is the prime exception and it has conformance bugs (sharing state with `random` last I checked). – R.. GitHub STOP HELPING ICE Jul 30 '15 at 04:04
  • @R.. — But just because a given implementation uses LCG at its core does not mean that the generator has poor entropy in the lower bits. It's trivial to work around the issue by invoking the LCG twice and bit shifting and xor'ing. For example, in a 64-bit PRNG: `static uint64_t x = 0; uint64_t r = (x = lcg(x)); r ^= (x = lcg(x)) >> 32; return r;` But yes, OK, citation is needed for the claim. – Todd Lehman Jul 30 '15 at 04:13
  • 2
    @ToddLehman On my system (OSX 10.10), the low bits are certainly not uniform. Run this on the command line for a live-updating count: http://pastebin.com/D5r7we3H – Kevin Jul 30 '15 at 04:44
  • 1
    @ToddLehman: I meant my suspicion is that most implementations use raw LCG output, not just LCG "at [the] core". – R.. GitHub STOP HELPING ICE Jul 30 '15 at 04:51
  • @R.. — Let's hope not! – Todd Lehman Jul 30 '15 at 04:54
  • @Kevin — Cool program. Compiled it with `Apple LLVM version 6.0 (clang-600.0.54)` on OS X 10.9 & 10.10. Looks super uniform to me (I tried it with N=32, 16, 4, and 2). – Todd Lehman Jul 30 '15 at 05:00
  • 1
    @ToddLehman: I think your hopes are misplaced. `rand` is hopelessly bad (it can generate at most `UINT_MAX` possible sequences due to `srand`'s signature) and trying to make it "better" only encourages people to use it. Rolling your own decent (non-cryptographic) PRNG is just a few lines and that's really what you should do - beyond sequence quality, it gives you useful properties like lack of global state, resumability, and cross platform same-sequence-for-same-seed. – R.. GitHub STOP HELPING ICE Jul 30 '15 at 13:50
  • @R.. — Good point. I see a couple people have written "rand() considered harmful" articles. Personally, I haven't used `rand` since the 1980s because of its terrible range. Lately, been using a two-phase 64-bit LCG with values from Knuth and am pretty satisfied with that (I don't need crypto strength randomness, just enough for games). I guess I was arguing more in theory, but I think you're absolutely right that people should eschew `rand` because it can't be trusted. Rolling your own is tricky, but at least it's a known thing. – Todd Lehman Jul 30 '15 at 19:02
7

That depends on:

  • The value of RAND_MAX
  • Your value of N

Let us assume your RAND_MAX is 2^32. If N is rather small (let's say 2) then the bias is 1 / 2^31 -- or too small to notice.

But if N is quite a bit larger, say 2^20, then the bias is 1 / 2^12, or about 1 in 4096. A lot bigger, but still pretty small.

slashingweapon
  • 11,007
  • 4
  • 31
  • 50
  • 2
    On the contrary, I think the answer is right on point. We're assuming a PRNG that generates numbers with perfect distribution. The question is, do we care about the bias? I tried to provide a way to quantify the bias, so the asker can determine for himself whether it is tolerable to him. This is all very language non-specific. – slashingweapon Oct 27 '12 at 22:00
  • 3
    Some systems have a `RAND_MAX` of `0xffff`, leading to a *much* bigger bias. – Kevin Oct 27 '12 at 22:04
  • 5
    Worse. Visual C++ implementations have RAND_MAX==0x7FFF, left over from 16-bit MSC 3.0 on MS-DOS. – Mike Housky Oct 27 '12 at 22:15
  • 1
    @slashingweapon can you direct me to link/resource for a formal way of calculating the bias? – themanwhosoldtheworld Feb 17 '15 at 07:35
1

One approach you can do is the following:

Knowing the value of N, you make R_MAX = ((RAND_MAX + 1) / N) * N; for uniformity.

So you can do your custom rand() function:

int custom_rand(int mod) {
    int x = rand();
    const int R_MAX = ((RAND_MAX + 1) / mod) * mod;    

    while (x > R_MAX) { // discard the result if it is bigger
        x = rand();
    }

    return (x % mod);
}
Murilo Vasconcelos
  • 4,677
  • 24
  • 27
  • 1
    what if rand_max is 2^32-1? – Eamon Nerbonne Nov 22 '14 at 10:06
  • 1
    When `RAND_MAX == INT_MAX` (a common occurrence). `RAND_MAX + 1` --> undefined behavior - (maybe `INT_MIN`). – chux - Reinstate Monica Dec 30 '14 at 20:12
  • I think you actually want `R_MAX = (RAND_MAX / N) * N;` and `while (x >= R_MAX)`, otherwise you have a bias for producing more zeros because `R_MAX % mod == 0`. Also, a `do { x = rand(); } while (x >= R_MAX)` would be better here, because then you wouldn't be writing `x = rand();` twice. – Todd Lehman Jul 30 '15 at 03:59
1

There are two problems with using remainder (% is not a "modulo" operator in C) to a uniform random number over a reduced range. First is that there is a slight bias toward smaller numbers (mentioned above) and second that typical PRNGs tend to be less random in the low order bits. I seem to recall that from Knuth (The Art of Computer Programming, Vol II, Seminumerical Algorithms) along with the claim that (after translating from MIX to C) rand()%2 is a poor source of random single bits. It's better to pick (rand() > RAND_MAX/2) (or test a high-order bit, if RAND_MAX is nearly a power of 2.)

The remainder should be good enough casual use on small intervals. Avoid it for simulations. Actually, avoid rand() altogether for large simulations or "Monte Carlo" computations. Implementations tend to have a period on the order of 2^32 or less. It's not hard to exceed 4 billion trials on a 2+ GHz processor.

Mike Housky
  • 3,959
  • 1
  • 17
  • 31
  • Typical LCGs (Linear Congruential Generators) do tend to be less random in the lower bits, yes. For example, multiplying by an odd number and adding an odd number *always* inverts the least significant bit when you have a power-of-two modulus. But it doesn't follow that typical PRNGs have this problem. All one has to do is run the LCG twice and bit-shift and xor to mix things up nicely, or other tricks. Any C library providing a `rand()` having less randomness in the lower bits is severely broken. – Todd Lehman Jul 30 '15 at 04:06
  • 1
    @ToddLehman: From the C11 specification, in a footnote to the rand() function description: "295) There are no guarantees as to the quality of the random sequence produced and some implementations are known to produce sequences with distressingly non-random low-order bits. Applications with particular requirements should use a generator that is known to be sufficient for their needs." So, such a rand() function is not considered seriously broken by ISO. Distressing, yes, but not broken. – Mike Housky Jul 30 '15 at 07:09