0

Suppose we have a random generator function rand5 that generates a random number between 1 and 5, inclusive. I've seen that if you want to use this to generate a random number between 1 and 7, inclusive, that you should repeatedly compute

(5 * (rand5() - 1) + rand5())

to generate a random number between 1 and 30, inclusive, and repeat this process until a number between 1 and 21 is generated, then take the result mod seven.

Why not just directly compute a random number between 1 and 7, inclusive, by using this formula?

(rand5() + rand5() + rand5()) % 7 + 1
templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
user3833308
  • 1,132
  • 3
  • 14
  • 39
  • Because `rand5() + rand5() - 3` can give -1? – c2huc2hu Sep 20 '17 at 17:54
  • removed that expression, how about the range 1-15 one ? – user3833308 Sep 20 '17 at 17:55
  • Because the second one isn't uniform. What people use is called rejection sampling, see wikipedia to understand how it works – c2huc2hu Sep 20 '17 at 17:57
  • why not second one is uniform ? it can produce 1-15 which is 1 away from 14. where as the other solutions 1-29 which is 8 away from 21 ? – user3833308 Sep 20 '17 at 17:59
  • Why don't you try it and see? Generate 7000 values according to that formula. Are they uniformly distributed? – John Coleman Sep 20 '17 at 18:02
  • @John sure I can do that. I am trying to understand it. Can you share some thoughts on why not this ? – user3833308 Sep 20 '17 at 18:04
  • 1
    Count how many times you can get each number using your expression (substituting 1-5 for each value gives you 5*5*5=125 expressions to evaluate). You can even write a triple for-loop to do this for you. That should explain what's going on fairly well. – Bernhard Barker Sep 20 '17 at 18:05
  • @John trying to understand the math behind it to generalize this for n and m – user3833308 Sep 20 '17 at 18:05
  • `rand5() + rand5() + rand5() - 1` --> `[-1 ...14]` and `[-1 ...14]%7 + 1` --> `[-1...6]`. (Assume `rand5()` makes [0...5]). – chux - Reinstate Monica Sep 20 '17 at 18:05
  • @chux. rand5() will return 1-5 (inclusive) lets assume that. I see than in that case we won't touch 1 at all – user3833308 Sep 20 '17 at 18:06
  • 1
    @user3833308 How then does `5 * rand5() + rand5()) - 1` make 1,2,3,4? – chux - Reinstate Monica Sep 20 '17 at 18:07
  • @chux got it. can you help generalize it to m & n ? – user3833308 Sep 20 '17 at 18:07
  • `(5 * rand5() + rand5()) - 1` --> `[5...29]` and `(rand5() + rand5() + rand5() - 1) % 7 + 1` --> `[1...7]` not even same range. How can they be expected to be univalent? Post is now quite unclear. Please edit. – chux - Reinstate Monica Sep 20 '17 at 18:09
  • 3
    You really should explain the problem from scratch here in your question. This is pretty cryptic for someone who's not familiar with it. – Bernhard Barker Sep 20 '17 at 18:10
  • @Dukeling my bad. updated the question – user3833308 Sep 20 '17 at 18:13
  • 1
    You can also search for the probabilities of getting different results when **rolling 2 dice** - that's a fairly common example of why adding uniform distributions don't lead to a uniform distribution. – Bernhard Barker Sep 20 '17 at 18:14
  • A fundamental problem is that no power of 5 is a multiple of 7. You can't partition 125 (the total number of outcomes in `rand5()` repeated 3 times) into 7 subsets of equal size. Not only does your particular formula fail, but *any* function of the form `f(rand5(),rand5(),rand5())` which maps to `1..7` will fail to yield a uniform random variable (though you should be able to do better than your current formula). Rejection sampling works by throwing away outcomes so that the number of remaining outcomes *is* a multiple of 7. – John Coleman Sep 20 '17 at 18:30
  • Duplicate, see https://stackoverflow.com/questions/137783/expand-a-random-range-from-1-5-to-1-7/842343#842343. – user515430 Sep 20 '17 at 19:36
  • summing random numbers with overlapping intervals is tricky because you alter the probability distribution. So the result might be in the range you want but the distribution might not preferring some numbers much more than the others see this **awesome Answer** [Understanding “randomness”](https://stackoverflow.com/a/3956538/2521214) – Spektre Sep 21 '17 at 06:46
  • Possible duplicate of [Expand a random range from 1–5 to 1–7](https://stackoverflow.com/questions/137783/expand-a-random-range-from-1-5-to-1-7) – Dave Sep 22 '17 at 03:41

1 Answers1

6

Here's a simpler, related question. Suppose you want to generate a number between 1 and 8, inclusive, and you have a pair of six-sided dice. What happens if you try to generate such a random number by rolling the pair of dice, taking the result mod eight, and then adding one? That is, what happens if you do something like the following?

int value = (rand6() + rand6()) % 8 + 1;

This will generate a random number in the right range, but it won't generate them uniformly. Specifically, the probabilities of getting different totals on the dice rolls aren't uniform:

 2:   1 / 36
 3:   2 / 36
 4:   3 / 36
 5:   4 / 36
 6:   5 / 36
 7:   6 / 36
 8:   5 / 36
 9:   4 / 36
10:   3 / 36
11:   2 / 36
12:   1 / 36

Taking these numbers, modding them by eight, adding one, and grouping them together gives the following probabilities of generating each number between 1 and 8:

 1:   5 / 36  (have to roll 8)
 2:   4 / 36  (have to roll 9)
 3:   4 / 36  (have to roll 2 or 10)
 4:   4 / 36  (have to roll 3 or 11)
 5:   4 / 36  (have to roll 4 or 12)
 6:   4 / 36  (have to roll 5)
 7:   5 / 36  (have to roll 6)
 8:   6 / 36  (have to roll 7)

Notice that these probabilities aren't uniform; you're more likely to get a 1, 7, or 8 than anything else.

This example isn't exactly what you're proposing, but it's the same idea. Adding up multiple uniformly random values does not give you a uniform distribution over the range of their sum, so if you start with such a non-uniform distribution and use mods to try to squish it into a smaller range, you'll get back a distribution over that range, but not necessarily a uniform one.

The more complex formula - which is a type of rejection sampling, by the way - generates a uniformly-random number over some range (1 - 30) repeats this process until it gets something between 1 - 21. The values produced in that range are then guaranteed to be uniformly random, and since 21 is a multiple of 7, modding by 7 guarantees a uniformly-random value between 1 - 7.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065