The premise is incorrect, assuming you want an unbiased random number. And because the premise is incorrect, there is no analogous solution. Or, better said, because the provided code produces a biased sample, any analogous code for sampling outside of the range will also produce a biased sample.
In C, I can use this simple expression to generate a random number in a range:
rand() % (max_number + 1 - minimum_number) + minimum_number
That will work, more or less, if the size of the range is small relative to the size of the range of possible values returned by rand()
. It will only produce a truly unbiased valuebif rand()
itself is unbiased and the size of the desired range is a factor of RAND_MAX + 1
. Since RAND_MAX + 1
is commonly a power of 2, the only range sizes which can produce an unbiased selection are also powers of two.
It's easy to see that with the pigeon-hole principle. Imagine that there are s
pigeon-holes, each corresponding to a value in the desired range. (s
must be max - min + 1
, of course.) Now each of the RAND_MAX + 1
values which rand()
could produce must be put into one of the pigeon-holes. Since each of those values has equal probability, and the probability of a pigeon-hole being selected is the sum of the probabilities of its contents, it follows that an unbiased outcome requires that all the pigeon-holes have the same number of values, which is only possible if s
is a factor of the number of possible values.
In the not uncommon case that RAND_MAX + 1
is 32768 (Windows, for example), if s
were 6 (a roll of a die), then there would be 5461 values in each of four pigeon-holes, and 5462 values in the other two. That bias is not enormous but it wouldn't pass an inspection by the Gaming Commissioner.
The situation is more dramatic when the desired range is close to RAND_MAX + 1
, which is the situation you will produce if you exclude a small range. In that case, most of the pigeon-holes will have just one value, and a handful of lucky pigeon-holes will each have two values, and therefore be twice as likely to be picked.
The simplest fix is "rejection sampling" which involves a loop. We reject s % (RAND_MAX + 1)
possible returns of rand()
by calling rand()
again if one of these values comes up. (It doesn't matter which values are rejected but it is simple to reject the ones less than s % (RAND_MAX + 1)
.) In the worst case, just under half of the possible return values will be rejected and the loop will run once on average. In more common cases, it will hardly ever run and branch prediction will reduce its cost to very little.