2

How would I quickly and safely* determine a random number within a range of 0 (inclusive) to r (exclusive)?

In other words, an optimized version of rejection sampling:

u32 myrand(u32 x)
{
    u32 ret = rand();

    while(ret >= x)
        ret = rand();

    return(ret);
}

*By safely, I mean a uniform distribution.

Kara
  • 6,115
  • 16
  • 50
  • 57
Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
  • This has been discussed before, with some good answers, but I can't find the link ATM. I do remember that (1) there is a web page with a detailed criticism of common ways to handle this, giving the "right" way, and (2) there is a powerful standard-ish library (either tr1, boost or C++0x) to handle various random number distributions with a decent (Mersenne Twistor based?) generator. –  Jun 17 '11 at 07:01
  • 1
    @steve314: You can make that Boost, TR1 _and_ C++0x. They all have `uniform_int` – MSalters Jun 17 '11 at 10:34
  • Link found - http://stackoverflow.com/questions/4195958/random-number-from-0-to-100 - see answers by Konrad Rudolph and Blastfurnace in particular. –  Jun 17 '11 at 16:24
  • -1 `rand()` is usually so far from cryptographically secure that making the distribution a little more uniform isn't really that helpful. – tc. Jun 20 '11 at 01:55
  • @tc `rand()` is a sample function which clearly shows the intent. Would you rather I confuse everyone by writing `BlumBlumShub()`? Besides, that's no reason to -1. – Mateen Ulhaq Jun 20 '11 at 03:50
  • Blatantly insecure "crypto" is *always* a -1. There are plenty of reasons to want a uniform distribution (why settle for "almost correct" when correct is not that difficult?), but someone will stumble upon this, see that it's "cryptographically secure", and use it to generate random passwords/password reset tokens/etc. – tc. Jun 22 '11 at 01:31

2 Answers2

6

Rejection sampling is the way to go if you want to have a uniform distribution on the result. It is notoriously difficult to do anything smarter. Using the modulo operator for instance results in an uneven distribution of the result values for any number that's not a power of 2.

The algorithm in you post however can be improved by discarding the unnecessary most significant bits. (See below.)

This is how the standard Java API implements Random.nextInt(int n):

public int nextInt(int n) {

    [...]

    if ((n & -n) == n)  // i.e., n is a power of 2
        return (int)((n * (long)next(31)) >> 31);

    int bits, val;
    do {
        bits = next(31);
        val = bits % n;
    } while (bits - val + (n-1) < 0);

    return val;
}

And in the commens you can read:

The algorithm is slightly tricky. It rejects values that would result in an uneven distribution (due to the fact that 231 is not divisible by n). The probability of a value being rejected depends on n. The worst case is n=230+1, for which the probability of a reject is 1/2, and the expected number of iterations before the loop terminates is 2.

aioobe
  • 413,195
  • 112
  • 811
  • 826
-1
u32 myrand(u32 x)
{
    return rand() % (x+1);
}

Since the question has been changed to include even distribution, this would need something more like this:

u32 myrand(u32 x)
{
   assert(x <= RAND_MAX && x > 0);
   int numOfRanges = (RAND_MAX % x);
   int maxAcceptedRand = numOfRanges * x;
   int randNumber;
   do
   {
      randNumber = rand();
   }
   while(randNumber <= maxAcceptedRand);
   return number / numOfRanges;
}
Darren B
  • 122
  • 3
  • 1
    This results in a non-uniform distribution of the result values, unless `x+1` is an even power of 2. – aioobe Jun 17 '11 at 06:02
  • The whole point of using rejection sampling is to avoid that. – Mateen Ulhaq Jun 17 '11 at 06:30
  • OK, fair enough. He has edited the question to include even distribution. I took safely to mean it must not have the potential to get into an almost endless loop as would happen in the rejection case on low values of x. – Darren B Jun 20 '11 at 01:15
  • Almost! You probably meant `numOfRanges = RAND_MAX/x`, but the correct way is more like `numOfRanges = (RAND_MAX+1)/x` but to reliably avoid overflow you need to do something more like `RAND_MAX/x + (RAND_MAX%x == x-1 ? 1 : 0)` (or just use uint64_t). – tc. Jun 20 '11 at 01:54
  • Doh, yes, divide not modulo. Sorry didn't try the code. I am at work and a had a couple of minutes free while compiling. The weak point of the whole thing is it's all based on the rand() function. It might give an even distribution across it's range but across multiple calls it doesn't behave like a true random source. – Darren B Jun 20 '11 at 02:22