mathematics behind modulo behavor

Question

Preamble

This question is not about the behavior of (P)RNG and rand(). It's about using power of two values uniformly distributed against modulo.

Introduction

I knew that one should not use modulo % to convert a value from a range to another, for example to get a value between 0 and 5 from the rand() function: there will be a bias. It's explained here https://bitbucket.org/haypo/hasard/src/ebf5870a1a54/doc/common_errors.rst?at=default and in this answer Why do people say there is modulo bias when using a random number generator?

But today after investigating some code which was looking wrong, I've made a tool to demonstrate the behavor of modulo: https://gitorious.org/modulo-test/modulo-test/trees/master and found that's not clear enough.

A dice is only 3 bits

I checked with 6 values in range 0..5. Only 3 bits are needed to code those values.

$ ./modulo-test 10000 6 3
interations = 10000, range = 6, bits = 3 (0x00000007)
  [0..7] => [0..5]

theorical occurences    1666.67 probability 0.16666667

   [   0] occurences    2446    probability 0.24460000 ( +46.76%)
   [   1] occurences    2535    probability 0.25350000 ( +52.10%)
   [   2] occurences    1275    probability 0.12750000 ( -23.50%)
   [   3] occurences    1297    probability 0.12970000 ( -22.18%)
   [   4] occurences    1216    probability 0.12160000 ( -27.04%)
   [   5] occurences    1231    probability 0.12310000 ( -26.14%)

  minimum occurences    1216.00 probability 0.12160000 ( -27.04%)
  maximum occurences    2535.00 probability 0.25350000 ( +52.10%)
     mean occurences    1666.67 probability 0.16666667 (  +0.00%)
   stddev occurences     639.43 probability 0.06394256 (  38.37%)

With 3 bits of input, the results are indeed awful, but behave as expected. See answer https://stackoverflow.com/a/14614899/611560

Increasing the number of input bits

What puzzled me, was increasing the number of input bits made the results different. You should not forgot to increase the number of iterations, eg the number of sample otherwise the results are likely wrong (see Wrong Statistics).

Lets try with 4 bits:

$ ./modulo-test 20000 6 4
interations = 20000, range = 6, bits = 4 (0x0000000f)
  [0..15] => [0..5]

theorical occurences    3333.33 probability 0.16666667

   [   0] occurences    3728    probability 0.18640000 ( +11.84%)
   [   1] occurences    3763    probability 0.18815000 ( +12.89%)
   [   2] occurences    3675    probability 0.18375000 ( +10.25%)
   [   3] occurences    3721    probability 0.18605000 ( +11.63%)
   [   4] occurences    2573    probability 0.12865000 ( -22.81%)
   [   5] occurences    2540    probability 0.12700000 ( -23.80%)

  minimum occurences    2540.00 probability 0.12700000 ( -23.80%)
  maximum occurences    3763.00 probability 0.18815000 ( +12.89%)
     mean occurences    3333.33 probability 0.16666667 (  +0.00%)
   stddev occurences     602.48 probability 0.03012376 (  18.07%)

Lets try with 5 bits:

$ ./modulo-test 40000 6 5
interations = 40000, range = 6, bits = 5 (0x0000001f)
  [0..31] => [0..5]

theorical occurences    6666.67 probability 0.16666667

   [   0] occurences    7462    probability 0.18655000 ( +11.93%)
   [   1] occurences    7444    probability 0.18610000 ( +11.66%)
   [   2] occurences    6318    probability 0.15795000 (  -5.23%)
   [   3] occurences    6265    probability 0.15662500 (  -6.03%)
   [   4] occurences    6334    probability 0.15835000 (  -4.99%)
   [   5] occurences    6177    probability 0.15442500 (  -7.34%)

  minimum occurences    6177.00 probability 0.15442500 (  -7.34%)
  maximum occurences    7462.00 probability 0.18655000 ( +11.93%)
     mean occurences    6666.67 probability 0.16666667 (  +0.00%)
   stddev occurences     611.58 probability 0.01528949 (   9.17%)

Lets try with 6 bits:

$ ./modulo-test 80000 6 6
interations = 80000, range = 6, bits = 6 (0x0000003f)
  [0..63] => [0..5]

theorical occurences   13333.33 probability 0.16666667

   [   0] occurences   13741    probability 0.17176250 (  +3.06%)
   [   1] occurences   13610    probability 0.17012500 (  +2.08%)
   [   2] occurences   13890    probability 0.17362500 (  +4.18%)
   [   3] occurences   13702    probability 0.17127500 (  +2.77%)
   [   4] occurences   12492    probability 0.15615000 (  -6.31%)
   [   5] occurences   12565    probability 0.15706250 (  -5.76%)

  minimum occurences   12492.00 probability 0.15615000 (  -6.31%)
  maximum occurences   13890.00 probability 0.17362500 (  +4.18%)
     mean occurences   13333.33 probability 0.16666667 (  +0.00%)
   stddev occurences     630.35 probability 0.00787938 (   4.73%)

Question

Please explain me why the results are different when changing the input bits (and increasing the sample count accordingly) ? What is the mathematical reasoning behind these ?

Wrong statistics

In the previous version of the question, I showed a test with 32bits of input and only 1000000 iterations, eg 10^6 samples, and said I was surprised to get correct results. It was so wrong I'm ashamed of: there must be N times more samples to have confidence to get all 2^32 values of the generator. Here 10^6 is way to small compaired to 2^32. Bonus for people able to explain this in mathematical/statistical language..

Here the wrong results:

$ ./modulo-test 1000000 6 32
interations = 1000000, range = 6, bits = 32 (0xffffffff)
  [0..4294967295] => [0..5]

theorical occurences  166666.67 probability 0.16666667

   [   0] occurences  166881    probability 0.16688100 (  +0.13%)
   [   1] occurences  166881    probability 0.16688100 (  +0.13%)
   [   2] occurences  166487    probability 0.16648700 (  -0.11%)
   [   3] occurences  166484    probability 0.16648400 (  -0.11%)
   [   4] occurences  166750    probability 0.16675000 (  +0.05%)
   [   5] occurences  166517    probability 0.16651700 (  -0.09%)

  minimum occurences  166484.00 probability 0.16648400 (  -0.11%)
  maximum occurences  166881.00 probability 0.16688100 (  +0.13%)
     mean occurences  166666.67 probability 0.16666667 (  +0.00%)
   stddev occurences     193.32 probability 0.00019332 (   0.12%)

I still have to read and re-read the excellent article of Zed Shaw "Programmers Need To Learn Statistics Or I Will Kill Them All".

Doesn't it all make sense? I mean, wouldn't you expect them to be different? 2/8 is a lot bigger fraction than 4/4294967296. — Carl Norum, Jan 30 '13 at 22:14
`rand() % N` is not advised because some (older?) implementations of `rand` show patterns in runs of sequential results within the low bits. Even if `rand()` were truly random on your implementation, that doesn't mean it's safe for everyone to count on it. — aschepler, Jan 30 '13 at 22:14
@aschepler, I think he's asking about the inherent bias towards lower results if the generated random number range isn't divisible by the range you're trying to get. That's not in and of itself affected by the randomness (or lack thereof) of the random number generator implementation. — Carl Norum, Jan 30 '13 at 22:15
Have you read [misconceptions about rand()][1]? [1]: http://www.azillionmonkeys.com/qed/random.html — autistic, Jan 31 '13 at 03:50
@CarlNorum Thanks I understand and appreciate your comment. eg (2^3 % 6) / 2^3 compaired to (2^32 % 6) / 2^32, but you might write an answer to explain how its going to (not) affect the results. — Yann Droneaud, Jan 31 '13 at 09:31
@CarlNorum You really want to write an answer because your comment make me understand why the results are different with 3, 4, 5 bits: it's either the lower 2 or lower 4 values that are affected by the bias. — Yann Droneaud, Jan 31 '13 at 09:34
@ydroneaud, that's got to be a duplicate. Let me see if I can find one and put a link here. If you want to write an answer yourself, that's allowed (and even recommended)! — Carl Norum, Jan 31 '13 at 18:35
Here you go: http://stackoverflow.com/questions/2509679/how-to-generate-a-random-number-from-within-a-range-c — Carl Norum, Jan 31 '13 at 18:59
I've found some explanation in http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#Modulo_bias — Yann Droneaud, Sep 18 '14 at 08:40

NPE · Answer 1 · 2013-01-31T07:15:17.137

9

In essence, you're doing:

(rand() & 7) % 6

Let's assume that rand() is uniformly distributed on [0; RAND_MAX], and that RAND_MAX+1 is a power of two. It is clear that rand() & 7 can evaluate to 0, 1, ..., 7, and that the outcomes are equiprobable.

Now let's look at what happens when you take the result modulo 6.

0 and 6 map to 0;
1 and 7 map to 1;
2 maps to 2;
3 maps to 3;
4 maps to 4;
5 maps to 5.

This explains why you get twice as many zeroes and ones as you get the other numbers.

The same thing is happening in the second case. However, the value of the "extra" numbers is much smaller, making their contribution indistinguishable from noise.

To summarize, if you have an integer uniformly distributed on [0; M-1], and you take it modulo N, the result will be biased towards zero unless M is divisible by N.

edited Jan 31 '13 at 07:15

answered Jan 30 '13 at 22:16

NPE

486,780
108
951
1,012

4

+1, and the same thing is happening with the 32-bit case. It's just that the bias is less visible because there are only 4 "extra" values in 4 billion, rather than 2 in 8. – Carl Norum Jan 30 '13 at 22:18
Thanks for the explanation. I was aware of the basic behavor and don't feel the need to describe it in the question. Regarding the 32 input bits example, it was wrong due to reduced sample count. So it was basically my mistake. I changed the question to fix the sample count and ask a different but related thing. – Yann Droneaud Jan 31 '13 at 09:21

score 2 · Answer 2 · answered Jan 30 '13 at 22:17

rand() (or some other PRNG) produces values in the interval [0 .. RAND_MAX]. You want to map these to the interval [0 .. N-1] using the remainder operator.

Write

(RAND_MAX+1) = q*N + r

with 0 <= r < N.

Then for each value in the interval [0 .. N-1] there are

q+1 values of rand() that are mapped to that value if the value is smaller than r
q values of rand() that are mapped to it if the value is >= r.

Now, if q is small, the relative difference between q and q+1 is large, but if q is large - 2^32 / 6, for example - the difference cannot easily be measured.

mathematics behind modulo behavor

2 Answers2

Linked