3

In C, I can use this simple expression to generate a random number in a range:

rand() % (max_number + 1 - minimum_number) + minimum_number

Is there a similar, non loop based expression that can generate a random number outside a range? For example if my range is from 3 to 5, I would want a random number that would either be in the range 0 to 2, or 6 to RAND_MAX.

klutt
  • 30,332
  • 17
  • 55
  • 95
edt11x
  • 218
  • 2
  • 8
  • Yes,you can! Hint: use unsigned. (except for thefact that `rand()% something` will be biased) – wildplasser Jun 21 '18 at 23:34
  • 1
    You can do it, but if your range is small compared to `RAND_MAX`, your results are going to be pretty badly biased, and the usual way to correct that bias will involve using a loop. (And if your range, the expected number of trips through the loop will also be small.) – Steve Summit Jun 21 '18 at 23:45
  • 1
    interesting blog post on bias and rejection sampling: http://dimitri.xyz/random-ints-from-random-bits/ – Mitch Wheat Jun 22 '18 at 00:08
  • Note that `RAND_MAX` is implementation dependent and only guaranteed to be 32767. So, if your `max_number` and `minimum_number` are unrestricted 32 or 64 bit numbers, you will need a more sophisticated solution to generate the number and then adjust it to be outside your range. – bruceg Jun 22 '18 at 00:22
  • 1
    Think of the ranges of numbers as a closed circle. Now the problem of excluding a range is the same as problem of finding numbers in a range. – Ajay Brahmakshatriya Jun 22 '18 at 05:32
  • @Ajay: Absolutely correct. But if the range is close to but not equal to the range of your random number generator, producing a uniform distribution without rejection sampling is, to say the least, tricky. – rici Jun 22 '18 at 14:08

3 Answers3

4

This would work:

int r = rand() % (maxval+1 - rangesize);
if(r>=rangemin)
    r+=rangesize;

From your example, rangesize would be 3 since there are three numbers in the range you want to avoid, and rangemin would be 3 because it is the smallest number in that range.

This would produce random numbers in the range [0, maxval] except numbers in the range [rangemin, rangemin+rangesize-1]

However, do note that using rand() % x often gives a bad distribution, so if that matters you need to think about that. Thanks to rici for pointing this out. See his answer for more information about this.

But given that you have a function r(lo, hi) that generates uniformly distributed numbers from lo to hi, the transformation if(r>=rangemin) r+=rangesize will work just fine and not mess up the distribution.

Relevant links:

Generating a uniform distribution of INTEGERS in C

http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx

klutt
  • 30,332
  • 17
  • 55
  • 95
  • Maybe because the result will not be randomly distributed. Or rather that for a large sample the distribution will be less even than the function rand() – london-deveoper Jun 22 '18 at 02:20
  • 2
    In the OP, it says "if my range is from 3 to 5, I would want a random number that would either be in the range 0 to 2, or 6 to RAND_MAX." Feeding that precise use case into this solution, we get a function which produces 0, 1 and 2 twice as often as a number in the rest of the range, which is a significant deviation from a uniform distribution. – rici Jun 22 '18 at 03:26
  • @rici Hmmm, I did a test now and you seem to be right. – klutt Jun 22 '18 at 12:27
  • @klutt: How's this: https://ideone.com/XFyN3O ? (Curiously, it's the case where SakoDaemon's solution would work perfectly. A stopped clock is right twice a day.) – rici Jun 22 '18 at 14:02
  • @rici Well I did a test on my own and confirmed that you are right. Now I'm trying to understand the problem and see if there's an easy fix. – klutt Jun 22 '18 at 14:03
  • @rici I read your answer. Am I correct if I say that the problem has nothing to do with the last two lines, but only the first line? And that it would be the same problem if you wanted random numbers in the interval continous `[0, maxval-rangesize]`? – klutt Jun 22 '18 at 19:42
  • Yes to both questions. – rici Jun 22 '18 at 20:37
2

The premise is incorrect, assuming you want an unbiased random number. And because the premise is incorrect, there is no analogous solution. Or, better said, because the provided code produces a biased sample, any analogous code for sampling outside of the range will also produce a biased sample.

In C, I can use this simple expression to generate a random number in a range:

 rand() % (max_number + 1 - minimum_number) + minimum_number

That will work, more or less, if the size of the range is small relative to the size of the range of possible values returned by rand(). It will only produce a truly unbiased valuebif rand() itself is unbiased and the size of the desired range is a factor of RAND_MAX + 1. Since RAND_MAX + 1 is commonly a power of 2, the only range sizes which can produce an unbiased selection are also powers of two.

It's easy to see that with the pigeon-hole principle. Imagine that there are s pigeon-holes, each corresponding to a value in the desired range. (s must be max - min + 1, of course.) Now each of the RAND_MAX + 1 values which rand() could produce must be put into one of the pigeon-holes. Since each of those values has equal probability, and the probability of a pigeon-hole being selected is the sum of the probabilities of its contents, it follows that an unbiased outcome requires that all the pigeon-holes have the same number of values, which is only possible if s is a factor of the number of possible values.

In the not uncommon case that RAND_MAX + 1 is 32768 (Windows, for example), if s were 6 (a roll of a die), then there would be 5461 values in each of four pigeon-holes, and 5462 values in the other two. That bias is not enormous but it wouldn't pass an inspection by the Gaming Commissioner.

The situation is more dramatic when the desired range is close to RAND_MAX + 1, which is the situation you will produce if you exclude a small range. In that case, most of the pigeon-holes will have just one value, and a handful of lucky pigeon-holes will each have two values, and therefore be twice as likely to be picked.

The simplest fix is "rejection sampling" which involves a loop. We reject s % (RAND_MAX + 1) possible returns of rand() by calling rand() again if one of these values comes up. (It doesn't matter which values are rejected but it is simple to reject the ones less than s % (RAND_MAX + 1) .) In the worst case, just under half of the possible return values will be rejected and the loop will run once on average. In more common cases, it will hardly ever run and branch prediction will reduce its cost to very little.

rici
  • 234,347
  • 28
  • 237
  • 341
0

A simple solution would be:

int b = rand() % 2, nr;
if (b)
    nr = rand() % min;  // [0, min - 1]
else
    nr = rand() % (RAND_MAX - max) + max + 1;  // [max + 1, RAND_MAX]

To avoid biasing for certain intervals (meaning different probabilities for different numbers), you'll still have to use loops. You may want to check the answer provided here for that.

Edit: As klutt pointed out, this solution adds some bias of its own since the obtained number has a 50% chance to be lower than your min and a 50% chance to be higher than your max, regardless of size difference between the two intervals. Therefore, unless you specifically want this behaviour, the other solution is better for minimal bias.

SakoDaemon
  • 973
  • 1
  • 6
  • 21