6

In answers to this other question, the following solution is provided, curtesy of OpenBSD, rewritten for brevity,

uint32_t foo( uint32_t limit ) {
  uint32_t min = -limit % limit, r = 0;

    for(;;) {
      r = random_function();
      if ( r >= min ) break;
    }
    return r % limit;
 }

How exactly does the line uint32_t min = -limit % limit work? What I'd like to know is, is there a mathematical proof that it does indeed calculate some lower limit for the random number and adequately removes the modulo bias?

NathanOliver
  • 171,901
  • 28
  • 288
  • 402
Johann Oskarsson
  • 776
  • 6
  • 15
  • 1
    Johann, you also mis-copied the condition. In the original answer it was "(x >= (RAND_MAX - RAND_MAX % n))" which I don't think "-limit % limit" does. – Jeffrey Aug 15 '19 at 15:38
  • @Jeffrey you did not read far enough. See this answer: https://stackoverflow.com/a/20051580/264751 – Johann Oskarsson Aug 15 '19 at 15:41
  • My problem with this is: What does "-limit" mean? What does unary minus on an unsigned value do? Is it even defined by any standard? cpppreference states that -limit here would give 2^32 - limit; is this 'official'? – Adrian Mole Aug 15 '19 at 15:54
  • @Adrian, maybe you want to read this: https://stackoverflow.com/a/2711560/264751 – Johann Oskarsson Aug 15 '19 at 15:57
  • @Johann Oskarsson - reading something similar while you were posting! Edited. – Adrian Mole Aug 15 '19 at 15:58
  • One wonders why they did not simply use `uint32_t max = -limit;` and `if (r < max) break;`. It would have the same net effect without a division-remainder operation in the initialization, although there would be more rejections when `limit` is a power of two. – Eric Postpischil Aug 15 '19 at 16:05
  • 1
    Pretty sure that on a system where `int` is 64-bits, the calculation `min = -limit % limit` will always set `min` to 0. So the OpenBSD implementation is broken unless there's more to it than what you've shown. – user3386109 Aug 15 '19 at 16:51
  • 1
    @user3386109 Yes, it ought to be `min = (uint32_t)-limit % limit`. – Ian Abbott Aug 15 '19 at 17:05

1 Answers1

7

In -limit % limit, consider that the value produced by -limit is 2wlimit, where w is the width in bits of the unsigned type being used, because unsigned arithmetic is defined to wrap modulo 2w. (The assumes the type of limit is not narrower than int, which would result in it being promoted to int and signed arithmetic being used, and the code could break.) Then recognize that 2wlimit is congruent to 2w modulo limit. So -limit % limit produces the remainder when 2w is divided by limit. Let this be min.

In the set of integers {0, 1, 2, 3,… 2w−1}, a number with remainder r (0 ≤ r < limit) when divided by limit appears at least floor(2w/limit) times. We can identify each of them: For 0 ≤ q < floor(2w/limit), qlimit + r has remainder r and is in the set. If 0 ≤ r < min, then there is one more such number in the set, with q = floor(2w/limit). Those account for all the numbers in the set {0, 1, 2, 3,… 2w−1}, because floor(2w/limit)•limit + min = 2w, so our counts are complete. For r different remainders, there are floor(2w/limit)+1 numbers with that remainder in the set, and for minr other remainders, there are floor(2w/limit) with that remainder in the set.

Now suppose we randomly draw a number uniformly from this set {0, 1, 2, 3,… 2w−1}. Clearly numbers with the remainders 0 ≤ r < min might occur slightly more often, because there are more of them in the set. By rejecting one instance of each such number, we exclude them from our distribution. Effectively, we are drawing from the set { min, min+1, min+2,… 2w−1}. The result is a distribution that has exactly floor(2w/limit) occurrences of each number with a particular remainder.

Since each remainder is represented an equal number of times in the effective distribution, each remainder has an equal chance of being selected by a uniform draw.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • 1
    As an addendum to your answer, it will break if `int` is wider than W bits as `-limit % limit` would always be 0 in that case. To fix it, use `uintW_t min = (uintW_t)-limit % limit`. – Ian Abbott Aug 15 '19 at 16:35
  • Oh, I see you already noted it will break in your answer, so sorry for assuming you hadn't noticed that! – Ian Abbott Aug 15 '19 at 17:36
  • I have accepted the answer, though I have to admit I'm still struggling to understand everything. – Johann Oskarsson Aug 16 '19 at 12:26