5

I am given a function rand5() that generates, with a uniform distribution, a random integer in the closed interval [1,5]. How can I use rand5(), and nothing else, to create a function rand7(), which generates integers in [1,7] (again, uniformly distributed) ?


  1. I searched stackoverflow, and found many similar questions, but not exactly like this one.
  2. My initial attempt was rand5() + 0.5*rand5() + 0.5*rand5(). But this won't generate integers from 1 to 7 with uniform probability. Any answers, or links to answers, are very welcome.
Chthonic Project
  • 8,216
  • 1
  • 43
  • 92
  • Additional note: in general, summing up rand5() will not work. As we sum over rand5(), we will begin to deviate from uniform distribution, and start approaching a normal distribution. – Chthonic Project Jul 03 '12 at 05:01
  • 2
    Looks like a duplicate for http://stackoverflow.com/questions/10464360/use-rand5-to-generate-rand7-with-the-same-probability – Mathias Jul 03 '12 at 05:29
  • @Mathias, no, these are not the same question, the one you linked is about being unable to generate pseudo-random numbers using C# – unkulunkulu Jul 03 '12 at 05:39
  • 2
    @unkulunkulu - It's not the same question, but it's the same problem, with a solution. (The question was about a bug in the implementation of the solution; the bug had nothing to do with the correctness of the approach.) – Ted Hopp Jul 03 '12 at 05:43
  • 1
    @TedHopp, what you said doesn't sound like 'a duplicate question' to me. In particular, there is no discussion about the possibility to build an actual algorithm (i.e. one finishing in a finite time). – unkulunkulu Jul 03 '12 at 05:50
  • 1
    @unkulunkulu - Any reasonable rejection sampling process for this problem will have probability 0 of not terminating. With pseudorandom number generation, I think the number of steps will actually be bounded (at least for for most combination of number generators and rejection processes). – Ted Hopp Jul 03 '12 at 06:11
  • @TedHopp so? The questions are not duplicate still. – unkulunkulu Jul 03 '12 at 18:06

4 Answers4

7

Note that a prefect uniform distribution cannot be achieved with a bounded number of draw5() invocations, because for every k: 5^k % 7 != 0 - so you will always have some "spare" elements.

Here is a solution with unbounded number of draw5() uses:

Draw two numbers, x1,x2. There are 5*5=25 possible outcomes for this.

Note that 25/7 ~= 3.57. Chose 3*7=21 combinations, such that each combination will be mapped to one number in [1,7], for all other 4 numbers - redraw.

For example:

(1,1),(1,2),(2,1) : 1
(3,1),(1,3),(3,2): 2
(3,3),(1,4),(4,1): 3
(2,4),(4,2)(3,4): 4
(4,3), (4,4), (1,5): 5
(5,1), (2,5), (5,2) : 6
(5,3), (3,5), (4,5) : 7
(5,4),(5,5),(2,3), (2,2) : redraw
unkulunkulu
  • 11,576
  • 2
  • 31
  • 49
amit
  • 175,853
  • 27
  • 231
  • 333
  • we have a possibility to draw infinitely (with a probability of 0) :) Although I think that that is impossible to implement without such a detail :) Oh, you have the proof about impossibility, then clean +1 :) – unkulunkulu Jul 03 '12 at 05:21
  • @unkulunkulu: Read my edit (last sentence) why a perfect uniform distribution cannot be done with bounded number of draw5 ops – amit Jul 03 '12 at 05:22
  • I would suggest you move that sentence to the beginning as this is what author is hoping for (a perfect solution mathematically) – unkulunkulu Jul 03 '12 at 05:24
4

Here's a simple way:

  1. Use rand5() to generate a sequence of three random integers from the set { 1, 2, 4, 5 } (i.e., throw away any 3 that is generated).
  2. If all three numbers are in the set { 1, 2 }, discard the sequence and return to step 1.
  3. For each number in the sequence, map { 1, 2} to 0 and { 4, 5 } to 1. Use these as the three bit values for a 3-bit number. Because the bits cannot all be 0, the number will be in the range [1, 7]. Because each bit is 0 or 1 with equal probability, the distribution over [1, 7] should be uniform.
Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
  • good answer, I wonder how this generalizes if the starting number was y instead of 5 and the target number was x instead of 7... – hackartist Jul 03 '12 at 05:26
  • Like amit's solution, this is not guaranteed to be a finite process. – Ted Hopp Jul 03 '12 at 05:26
  • @hackartist - I think this kind of rejection sampling should be pretty easy to generalize to any pair of ranges. – Ted Hopp Jul 03 '12 at 05:30
  • 1
    @hackartist, and also if all the prime factors of `x` are present in the set of the prime factors of `y`, then there exists a finite process. – unkulunkulu Jul 03 '12 at 05:36
2

ok I had to think about it for a while but it is actually not that hard. Imagine instead of rand5 you had rand2 which either outputs 0 or 1. You can make rand2 our of rand5 by simply doing

rand2() {
    if(rand5() > 2.5) return 1
    else return 0
}

now using rand2 multiple times do a tree to get rand7. For example if you start rand7 can be in [1,2,3,4,5,6,7] after a throw of rand2 which gives 0 you now subset to [1,2,3,4] and after another throw or rand2 which is 1 you subset to [3,4] and a final throw of 1 gives the output of rand7 to be 4. In general this tree trick can work to take a rand2 and map to randx where x is any integer.

hackartist
  • 5,172
  • 4
  • 33
  • 48
  • rand2 will return 0 when rand5 returns 1 or 2; it will return 1 when rand5 returns 3, 4, or 5. You need rand2 to return 0 or 1 with equal probability. – Ted Hopp Jul 03 '12 at 05:23
  • 1
    True, well you could still make the rand2 function by rethrowing rand5 if it was 3 – hackartist Jul 03 '12 at 05:25
  • I had started along this line, and was stuck at the remark made by Ted Hopp. Never thought of reusing rand2() if the first throw gave me 3. Ugh, and double ugh! I like this approach a lot, because behind my question lies the generalization, exactly as noted by @hackartist, and this is a finite process in the sense that Pr(process never ends) = 0. – Chthonic Project Jul 03 '12 at 08:09
2

Here's one meta-trick which comes in handy for lots of these problems: the bias is introduced when we treat the terms differently in some fashion, so if we treat them all the same at each step and perform operations only on the set, we'll stay out of trouble.

We have to call rand5() at least once (obviously!), but if we branch on that bad things happen unless we're clever. So instead let's call it once for each of the 7 possibilities:

In [126]: import random

In [127]: def r5():
   .....:     return random.randint(1, 5)
   .....: 

In [128]: [r5() for i in range(7)]
Out[128]: [3, 1, 3, 4, 1, 1, 2]

Clearly each of these terms was equally likely to be any of these numbers.. but only one of them happened to be 2, so if our rule had been "choose whichever term rand5() returns 2 for" then it would have worked. Or 4, or whatever, and if we simply looped long enough that would happen. So there are lots of way to come up with something that works. Here (in pseudocode -- this is terrible Python) is one way:

import random, collections

def r5():
    return random.randint(1, 5)

def r7():
    left = range(1, 8)
    while True:
        if len(left) == 1: 
            return left[0]
        rs = [r5() for n in left]
        m = max(rs)
        how_many_at_max = rs.count(m)
        if how_many_at_max == len(rs):
            # all the same: try again
            continue
        elif how_many_at_max == 1:
            # hooray!
            return left[rs.index(m)]
        # keep only the non-maximals
        left = [l for l,r in zip(left, rs) if r != m]

which gives

In [189]: collections.Counter(r7() for _ in xrange(10**6))
Out[189]: Counter({7: 143570, 5: 143206, 4: 142827, 2: 142673, 6: 142604, 1: 142573, 3: 142547})
DSM
  • 342,061
  • 65
  • 592
  • 494
  • A very interesting approach, but: I don't quite see why it should work (I mean, we can in one of the other suggested algorithms instead of throwing away some particular numbers just count them as one of the 7 numbers, in order for example, it would result in a uniform distribution, but not random, so this sample doesn't prove anything. Also, this solution looks to throw away even more numbers than others. – unkulunkulu Jul 03 '12 at 05:49
  • @unkulunkulu: I made no claims to efficiency. What's troubling you about the approach, though? At every iteration, each term has an equal chance of being returned or removed. – DSM Jul 03 '12 at 05:56
  • What bugs me is that the algorithm is kind of complicated, first you draw a basic idea (which is not too clear either), then you present a piece of code without trying to prove that it will work. And I don't see a reason for these complications: it is not faster, it is not simpler, it's not guaranteed to return in a finite time. It's interesting purely because it's not clear whether it's correct or not. – unkulunkulu Jul 03 '12 at 06:01
  • It's certainly not faster; I agree it's not guaranteed to return in finite time; but I think it *is* simpler in many respects, as it doesn't depend upon arithmetic features of 5 and 7, and generalizes immediately to other cases. YMMV, I guess. – DSM Jul 03 '12 at 06:10
  • @unkulunkulu: in Python, `m = max(rs)` sets m equal to the maximum `value`, not to the index of the maximum value. You will note that we then test for how many values are at the maximum, and each branch is fair. – DSM Jul 03 '12 at 06:20
  • 1
    yeah, sorry, I was mistaken, my test was incorrect (didn't notice that the dictionary changes the order of the keys from time to time), the algorithm is correct :) – unkulunkulu Jul 03 '12 at 06:23