This example code illustrates that std::rand
is a case of legacy cargo cult balderdash that should make your eyebrows raise every time you see it.
There are several issues here:
The contract people usually assume—even the poor hapless souls who don't know any better and won't think of it in precisely these terms—is that rand
samples from the uniform distribution on the integers in 0, 1, 2, …, RAND_MAX
, and each call yields an independent sample.
The first problem is that the assumed contract, independent uniform random samples in each call, is not actually what the documentation says—and in practice, implementations historically failed to provide even the barest simulacrum of independence. For example, C99 §7.20.2.1 ‘The rand
function’ says, without elaboration:
The rand
function computes a sequence of pseudo-random integers in the range 0 to RAND_MAX
.
This is a meaningless sentence, because pseudorandomness is a property of a function (or family of functions), not of an integer, but that doesn't stop even ISO bureaucrats from abusing the language. After all, the only readers who would be upset by it know better than to read the documentation for rand
for fear of their brain cells decaying.
A typical historical implementation in C works like this:
static unsigned int seed = 1;
static void
srand(unsigned int s)
{
seed = s;
}
static unsigned int
rand(void)
{
seed = (seed*1103515245 + 12345) % ((unsigned long)RAND_MAX + 1);
return (int)seed;
}
This has the unfortunate property that even though a single sample may be uniformly distributed under a uniform random seed (which depends on the specific value of RAND_MAX
), it alternates between even and odd integers in consecutive calls—after
int a = rand();
int b = rand();
the expression (a & 1) ^ (b & 1)
yields 1 with 100% probability, which is not the case for independent random samples on any distribution supported on even and odd integers. Thus, a cargo cult emerged that one should discard the low-order bits to chase the elusive beast of ‘better randomness’. (Spoiler alert: This is not a technical term. This is a sign that whosever prose you are reading either doesn't know what they're talking about, or thinks you are clueless and must be condescended to.)
The second problem is that even if each call did sample independently from a uniform random distribution on 0, 1, 2, …, RAND_MAX
, the outcome of rand() % 6
would not be uniformly distributed in 0, 1, 2, 3, 4, 5 like a die roll, unless RAND_MAX
is congruent to -1 modulo 6. Simple counterexample: If RAND_MAX
= 6, then from rand()
, all outcomes have equal probability 1/7, but from rand() % 6
, the outcome 0 has probability 2/7 while all other outcomes have probability 1/7.
The right way to do this is with rejection sampling: repeatedly draw an independent uniform random sample s
from 0, 1, 2, …, RAND_MAX
, and reject (for example) the outcomes 0, 1, 2, …, ((RAND_MAX + 1) % 6) - 1
—if you get one of those, start over; otherwise, yield s % 6
.
unsigned int s;
while ((s = rand()) < ((unsigned long)RAND_MAX + 1) % 6)
continue;
return s % 6;
This way, the set of outcomes from rand()
that we accept is evenly divisible by 6, and each possible outcome from s % 6
is obtained by the same number of accepted outcomes from rand()
, so if rand()
is uniformly distributed then so is s
. There is no bound on the number of trials, but the expected number is less than 2, and the probability of success grows exponentially with the number of trials.
The choice of which outcomes of rand()
you reject is immaterial, provided that you map an equal number of them to each integer below 6. The code at cppreference.com makes a different choice, because of the first problem above—that nothing is guaranteed about the distribution or independence of outputs of rand()
, and in practice the low-order bits exhibited patterns that don't ‘look random enough’ (never mind that the next output is a deterministic function of the previous one).
Exercise for the reader: Prove that the code at cppreference.com yields a uniform distribution on die rolls if rand()
yields a uniform distribution on 0, 1, 2, …, RAND_MAX
.
Exercise for the reader: Why might you prefer one or the other subsets to reject? What computation is needed for each trial in the two cases?
A third problem is that the seed space is so small that even if the seed is uniformly distributed, an adversary armed with knowledge of your program and one outcome but not the seed can readily predict the seed and subsequent outcomes, which makes them seem not so random after all. So don't even think about using this for cryptography.
You can go the fancy overengineered route and C++11's std::uniform_int_distribution
class with an appropriate random device and your favorite random engine like the ever-popular Mersenne twister std::mt19937
to play at dice with your four-year-old cousin, but even that is not going to be fit for generating cryptographic key material—and the Mersenne twister is a terrible space hog too with a multi-kilobyte state wreaking havoc on your CPU's cache with an obscene setup time, so it is bad even for, e.g., parallel Monte Carlo simulations with reproducible trees of subcomputations; its popularity likely arises mainly from its catchy name. But you can use it for toy dice rolling like this example!
Another approach is to use a simple cryptographic pseudorandom number generator with a small state, such as a simple fast key erasure PRNG, or just a stream cipher such as AES-CTR or ChaCha20 if you are confident (e.g., in a Monte Carlo simulation for research in the natural sciences) that there are no adverse consequences to predicting past outcomes if the state is ever compromised.