How to get a random number from 0 to N-1 with a "same probability" in C?

Question

I know this might be a "old" question, but I want to focus on the probability.

My first question is: in C, rand() will give a number from 0 to RAND_MAX, does each number in this interval have the same probability to be chosen by rand()?

The second question: if rand() lets each number from 0 to RAND_MAX to have the same (or approximately same) probability to be chosen, when I want to get a random number from 0 to N-1 (N-1 < RAND_MAX), I'll do this generally:

rand()%N

But if RAND_MAX is NOT the multiple of N, the probability of random number chosen from 0 to N-1 might not be same

For instance, suppose RAND_MAX=150 and N=100, when I do rand()%100, the number from 0 to 49 will have a higher probability to be chosen than the number from 50 to 99 because 150 is not the multiple of 100.

Is there a algorithm or function in C, which can let each random number have the same probability to be chosen?

You can divide by double: rand()/(double)RAND_MAX and the multiply by your range, but this has some other problems, maybe someone can point them out. — this, Feb 01 '14 at 15:55
Just replace 'possibility' with 'probability', and the answer to both questions is pretty much Yes, given the theoretical limitations of any pseudo RNG (random number generator). — barak manos, Feb 01 '14 at 15:56
I don't often recommend Microsoft anything, but you should watch: http://channel9.msdn.com/Events/GoingNative/2013/rand-Considered-Harmful — Grady Player, Feb 01 '14 at 16:00
What's wrong with `rand() / (double)RAND_MAX` to scale the random numbers into `[0,1]` range. This way should avoid bias — Brandin, Feb 01 '14 at 16:05
Example of why you should never use `rand()%N` - http://stackoverflow.com/a/20267526/2016408 (applies to OSx, but gives you a hunch on how these things can break) — Leeor, Feb 01 '14 at 16:40

score 2 · Answer 1 · answered Feb 01 '14 at 15:58

Leaving aside the fact that rand()%N is a very bad way to get a random number in the range 0..N-1, the question you ask is quite simple. Find the largest number, M say, that satisfies both M <= RAND_MAX and M % N == 0. Then when you call rand(), reject a value if it is >= M and call rand() again until you get a value that is < M.

But this particular nuance is pointless because rand()%N will be hopelessly biased. You need to use as many of the bits returned by in rand() as you possibly can.

score 2 · Answer 2 · edited May 23 '17 at 11:49

Assuming rand() itself is evenly distributed (not always a good assumption.), call rand() again as needed.

Based on Trying to take 1 random digit at a time

#include <stdlib.h>
int rand_Flat_Distribution_0_to_Nm1(int N) {
  assert(N <= RAND_MAX);
  assert(N > 0);
  int rmax = RAND_MAX - (RAND_MAX % N) - 1;
  int r;
  while ((r = rand()) > rmax);
  return r%N;
}

Example: If RAND_MAX was 32767 and N was 100, rmax would have the value of 32699. Any random value in the range 32700 to 32767 would get tossed and a new random value would be fetched, eliminated the bias %N typically causes.

This does not compensate for deficiencies in rand(). C does not specify the quality of rand(), just that it generates values 0 to RAND_MAX and RAND_MAX is a least 32767.

For values of N greater than RAND_MAX, a different solution is needed.

score 0 · Answer 3 · answered Feb 01 '14 at 15:56

There are no certainties about how random your results will truly be when using rand(), it's highly dependent on what the system does to provide random numbers. There are various 3rd party pseudo random number generator (PRNG) packages available if you search, but if randomness is extremely important, a solution involving hardware is probably better.

You're right in that simply modding off the excess gives bias towards values. You can avoid this by converting your random result to a floating point by dividing by the largest value the random generator can provide, and then by multiplying that by the number of elements in the range of values you're willing to deal with. If your range did not begin with 0, you would then add in the base value you expect.

The C specs don't mandate a particular implementation, but the constraints given over the `rand/srand` couple almost guarantee that it will be an LCG. But most importantly, passing through FP to get back to integers will still give you a biased result, although in a less obvious way. — Matteo Italia, Feb 01 '14 at 17:51

Vikram Bhat · Answer 4 · 2014-02-02T12:03:15.807

here is way to do it evenly using binary representation : -

1. generate logN bits using rand()%2
2. construct decimal number out of them.
3. check with N
4. if less than N return else repeat 1 - 3.

Note:- I dont think there is any other way than rejection to generate even distribution because reusing the once which are greater than or equal to N will always cause in-balance in probabilities

Time Complexity :- If truly random generator is used then there will be on average only two runs of the loop 1-3 which would mean 2*logN which O(logN*Trand) for average case

Here is a Java implementation to support my algorithm : -

public static void getrand(int n,int k) {

        int range = n-1;
        int bits = 0;
        Random r = new Random();
        while(range>0) {
            range = range>>1;
            bits++;
        }
        System.out.println("bits: "+bits);
        //int[] freq = new int[n];
        int count = 0;
        for(int i=0;i<k;i++) {
            int steps  = 0;
            while(true) {
                int randomNum = 0;
                steps++;
                for(int j=0;j<bits;j++) {

                    randomNum = randomNum<<1|(r.nextInt(2));
                }
                if(randomNum<n) {
                    System.out.println("Random Number: "+randomNum+" steps: "+steps);
                    //freq[randomNum]++;
                    count = count + steps;
                    break;
                }

            }
        }
        System.out.println("average steps: "+(float)(count)/k);

    }

n = number less than which you need to generate values. k = total number random number you need to generate.

@OliCharlesworth It doesnt really matter if i construct it using %10 or %2 because it will take same time because it take on average 5*log10(N) for constructing decimal number as compared it takes 1.5*log2(N) for binary number which when brought to base10 gives 5*log10(N) . Moreover %2 is more likely to give better distribution than %10. — Vikram Bhat, Feb 02 '14 at 11:27

Seva Alekseyev · Answer 5 · 2014-02-01T16:47:02.183

-1

In theory, such an algorithm is impossible for an arbitrary value of N.

If you call rand() X times, there are RAND_MAX^X possible outcomes. Unless the prime factors of N are all also factors of RAND_MAX, there's no way RAND_MAX^X would be divisible by N. They don't have to be divible by one another for even distribution to be possible, but there should exist such Y so that RAND_MAX^Y is divisible by N. So RAND_MAX=12 and N=9 works (12^2 is divisible by 9), but RAND_MAX=12 and N=10 doesn't (no matter what X is, 12^X won't be divisible by 10).

That said, the larger X is, the closer you can get to even distribution with the modulo formula. If you call rand() X times, calculate rand[0]*RAND_MAX^(X-1) + rand[1]*RAND_MAX^(X-2) + ... rand[X-1], and take that modulo N, as X tends to infinity, the probability distribution would tend to even. That's the best you can get.

edited Feb 01 '14 at 16:47

answered Feb 01 '14 at 15:57

Seva Alekseyev

59,826
25
160
281

It seems the OP is already aware that the modulo approach is flawed; they are asking for a non-flawed algorithm. – Oliver Charlesworth Feb 01 '14 at 16:34
1

My very point is that with arbitrary N, there can't be one. – Seva Alekseyev Feb 01 '14 at 16:45
2

Of course there is; just use a rejection-based approach. – Oliver Charlesworth Feb 01 '14 at 16:46
You mean - reject some outcomes and repeat the process? In that case, it'd have theoretically unbounded max run time. – Seva Alekseyev Feb 01 '14 at 16:48
4

Indeed, but the probability of running forever is asymptotically zero. In practice, this is fine, and is the approach used by most libraries. – Oliver Charlesworth Feb 01 '14 at 16:50

How to get a random number from 0 to N-1 with a "same probability" in C?

5 Answers5