Generate random number from random bits

Question

So I have a function that gives me random bits rand(0,1) i want to generalize this to rand(a,b) that gives me a random number in the range a to b.

My idea is to just calculate the amount of bits in b - a and then append them together. I think this will work but it's not going to be uniform. I feel like it will favor larger numbers as opposed to smaller numbers(numbers closer to a). Not really asking for a straight up answer just some help would be nice.

EDIT: This is my idea so far, just not sure on the uniform part

    pseudo code:
    function rand_range(a, b):
        n = b - a
        sum = a
        for i in range(n):
            sum += rand(0,1)

        return sum

I'm not really sure, it just seems like if the range favored higher values as i'd have to call rand for as many bits as the range was, so the higher the range the more rand calls and the higher the average value of my random number. Just my intuition, it may be completely flawed I have no idea. — Igglyboo, Mar 03 '14 at 03:12
does `rand(a,b)` return an integer between integers `a` and `b`? — Nishanth, Mar 03 '14 at 03:22
I think the real issue is with arbitrary ranges. If a = 0 and b = 2 ** n - 1, you're good. But how to handle other types of ranges seems tricky. — FogleBird, Mar 03 '14 at 03:30
Yes, the more bits you get, the higher the *average* value will be. In fact if the random bit generator is truly random then over a large number of samples the average will be very close to `(b-a)/2`. But with a random bit generator you should be just as likely to generate `00000` as `01101` or any other 5-bit sequence. — Jim Mischel, Mar 03 '14 at 03:40
And in fact, the random numbers generated this way will follow a binomial distribution, for large n this will come close to a normal distribution. — Lutz Lehmann, Mar 03 '14 at 11:13

Nishanth · Answer 1 · 2014-03-03T03:52:14.503

For a uniform distribution you need rejection sampling:

Lets say you want to generate numbers between 4,5,6 (inclusive), then 2 bits are sufficient. Mapping 00 -> 4, 01 -> 5, 10 -> 6, 11 -> reject

pseudo code:
function rand_range(a, b):
    n = ceil(log2(b - a))
    m = b-a

    while(true)
        sum = a
        bits = []
        for i in range(n):
            bits.append(rand(0,1))
        sum += ToBase10(bits)
        if sum <= b:
            break

    return sum

score 2 · Accepted Answer · answered Mar 03 '14 at 03:51

2

Yes, it's not going to be uniform.

Consider the simple case of 3 bits:

0+0+0  0
0+0+1  1
0+1+0  1
0+1+1  2
1+0+0  1
1+0+1  2
1+1+0  2
1+1+1  3

It's clear to see that 1 and 2 are more likely to occur than 0 or 3.

This gets a lot more non-uniform as you increase the number of bits - 0 and the maximum will never be able to occur more than once, the ones in the middle occur the most.

For a random distribution, the best I can think of involves throwing some generated numbers away.

Round b-a up to the closest power of 2 minus 1, then generate each bit individually and, if the result is greater than b-a, try again.

So, if b-a is 5, round up to 7, and generate the 3 bits involved to make a maximum number of 7:

Now, in the case of 6 or 7, throw them away and try again.

This can be done by using strings and concatenating a 0 or a 1, and converting to a number at the end, or, at each step, multiplying by 2 (as to shift all the bits one position to the left) and adding 0 or 1.

At the end you'll still add the result to a.

answered Mar 03 '14 at 03:51

Bernhard Barker

54,589
14
104
138

And if you want to minimize the number of thrown-away values see this answer: http://stackoverflow.com/a/20818831/5987 – Mark Ransom Mar 03 '14 at 03:59
Why throwing away and not rescaling? – Alexandru Barbarosie Mar 03 '14 at 04:08
@AlexandruBarbarosie Rescaling could work, but making it perfectly uniform could be a problem / impossible, although a slight favour towards certain numbers may be acceptable. – Bernhard Barker Mar 03 '14 at 04:12
If you have let's say range from `0` to `1024` you will need already 10 bits, where any value above `10000 00000` should be discarded, hence roughly half of the computations, which is kind of bad... – Alexandru Barbarosie Mar 03 '14 at 11:03
@AlexandruBarbarosie Yes, in the worst case, you have an around 50% chance of failing, but failing twice is 25%, 3 times 12.5%, etc. - the probability of carrying on particularly long is very low, and I certainly don't expect that we'll be dealing with the worst-case the whole time. – Bernhard Barker Mar 03 '14 at 11:14
@AlexandruBarbarosie, rescaling simply redistributes the errors, it doesn't eliminate them. Try it on this simple example and see for yourself. In practice you get many more bits than you need from your RNG so the number of values that must be discarded is minuscule. – Mark Ransom Mar 04 '14 at 02:49
@MarkRansom what kind of errors are you speaking about before rescaling, from what I see there aren't any, hence there is nothing to redistribute. The discarded values are up to 50%, you can see it in one of my above comments. – Alexandru Barbarosie Mar 04 '14 at 06:47
@AlexandruBarbarosie, the errors are in the unevenness of the distribution; see my link above for an example. This is a well known problem. As for the number of values discarded, yes it can approach 50% worst case, but my point was that you're not likely to approach the worst case. Although in the contrived example given in the question the worst case is much more likely. Again see my link above. – Mark Ransom Mar 04 '14 at 13:10

Alexandru Barbarosie · Answer 3 · 2014-03-03T04:01:47.810

Well, not only the result is not random, but you can fail generate a value between (a,b) since it can so happen that rand(0,1) will always generate 0, hence yielding a number which is out of your range (if a>0).

The problem is there because let's say for a range (0,5), 0 will have only 1 representation being 00000 while 1 will have 5: 10000,01000,00100,00010,00001. Realizing this, the straight forward solution should be a bijective mapping, hence the easiest solution is to consider your 0 and 1 as bits of a number, because any number has a unique representation in base 2.

Hence the pseudo code:

const MAX_BITS = 9;
const MAX_VAL = 1023; 
fun rand_range(a,b){
    sum = 0;
    for i<MAX_BITS
        sum += pow(2,i)*rand(1,0)
    // rescale
    return (b-a) * sum/MAX_VAL + a

}

For MAX_BITS and MAX_VAL limits of your data type that will be input to rand_range() can be chosen so that you can guarantee that every input will be rescaled properly.

@lgglyboo a slight bias is possible, and it totally depends on `MAX_VAL` which should be `>=(b-a)` but at the same time as small as possible, in other words, the first power of 2 that is bigger than `b-a`, in this case the bias can be negligible. — Alexandru Barbarosie, Mar 03 '14 at 10:57

Vikram Bhat · Answer 4 · 2014-03-03T04:52:22.400

Think following is the better way to do it :-

1. find log2(b-a) (number of bits to represent b-a)
2. generate log2(b-a) bits at random and construct decimal number from it.
3. if number is greater than (b-a) reject it and repeat 2.
4. else evaluate a + rand(b-a).

Time Complexity :- If random number generator is truely random then you will take two iterations to get the random number in range b-a hence T(a,b) = log(b-a)

Here is java implementation :-

public class RNG {

    public static int rand(int a,int b) {
        int bits = 0;
        int diff = b-a;
        Random r = new Random();
        while(diff>0) {
            bits++;
            diff = diff/2;
        }
        while(true) {
            int acc = 0;
            for(int i=0;i<bits;i++) {
                acc = 2*acc +  r.nextInt(2);
            }
            if(acc<=b-a) {
                return(a+acc);
            }

        }

    }


    public static void main(String[] args) {

        int a = 150;
        int b = 300;
        int freq[] = new int[b+1];
        for(int i=0;i<1000;i++) {
          int k = rand(a,b);
          freq[k]++;

        }
        System.out.println("freq:");
        for(int j=a;j<=b;j++) {
            System.out.println(j+" : "+freq[j]);
        }
    }

}

The only problem I see with this is that it's not guaranteed to actually finish. — Igglyboo, Mar 03 '14 at 04:37
It has very low probability to not finish but in average it will finish in 2 iterations. — Vikram Bhat, Mar 03 '14 at 04:50
@Igglyboo probability that it will not finish in k iterations is (0.5)^k which for even 10 iteration is less than 0.001 — Vikram Bhat, Mar 03 '14 at 04:56

jfs · Answer 5 · 2014-03-05T02:38:52.317

In Python you could subclass random.Random class to get the full interface including randint(a, b).

import random

class Random(random.Random):
    def random(self):
        """Get the next random number in the range [0.0, 1.0)."""
        return self.getrandbits(53) * 2**-53
    def getrandbits(self, k):
        """getrandbits(k) -> x.  Generates an int with k random bits."""
        return sum(rand(0, 1) << r for r in range(k))

    def seed(self, *args, **kwargs): # unused methods
        return None
    def getstate(self, *args, **kwargs):
        raise NotImplementedError
    def setstate(self, *args, **kwargs):
        raise NotImplementedError

x = rand(a,b) can be expressed as r = Random(); x = r.randint(a, b)

The advantage is that if you define random(), getrandbits() methods correctly then the rest of the code is already tested and just works. _randbelow() method shows how these primitives could be used to return a random int in the range [0,n) that could be easily extended to define randint(a, b).

Generate random number from random bits

5 Answers5