Deciding between random_device and seed_seq to generate seeds for multiple random number sequences

Question

When writing code that requires multiple independent random number distributions/sequences (example below with two), it seems that there are two typical ways to implement (pseudo-)random number generation. One is simply using a random_device object to generate two random seeds for the two independent engines:

std::random_device rd;
std::mt19937 en(rd());
std::mt19937 en2(rd());
std::uniform_real_distribution<> ureald{min,max};
std::uniform_int_distribution<> uintd{min,max};

The other involves using the random_device object to create a seed_seq object using multiple "sources" of randomness:

// NOTE: keeping this here for history, but a (hopefully) corrected version of
// this implementation is posted below the edit
std::random_device rd;
std::seed_seq seedseq{rd(), rd(), rd()}; // is there an optimal number of rd() to use?
std::vector<uint32_t> seeds(5);
seedseq.generate(seeds.begin(), seeds.end());
std::mt19937 en3(seeds[0]);
std::mt19937 en4(seeds[1]);
std::uniform_real_distribution<> ureald{min,max};
std::uniform_int_distribution<> uintd{min,max};

Out of these two, is there a preferred method? Why? If it is the latter, is there an optimal number of random_device "sources" to use in generating the seed_seq object?

Are there better approaches to random number generation than either of these two implementations I've outlined above?

Thank you!

Edit

(Hopefully) corrected version of seed_seq implementation for multiple distributions:

std::random_device rd;
std::seed_seq seedseq1{rd(), rd(), rd()}; // is there an optimal number of rd() to use?
std::seed_seq seedseq2{rd(), rd(), rd()};
std::mt19937 en3(seedseq1);
std::mt19937 en4(seedseq2);
std::uniform_real_distribution<> ureald{min,max};
std::uniform_int_distribution<> uintd{min,max};

You shouldn't extract values from a `seed_set` yourself. You're suppose to pass the entire vector of generated seeds to your random number generator, which will extract as many random seeds as it needs to optimally initialize itself. — François Andrieux, Nov 29 '17 at 21:51
@FrançoisAndrieux You are saying that I should have `std::mt19937 en3(seedseq);` and `std::mt19937 en4(seedseq);`? In that case, if you make both distributions the same (e.g., both `uniform_real_distribution`), the generated random number sequences are the same for both. — AnInquiringMind, Nov 29 '17 at 22:01
What I meant to say is that a single `int` from `seed_seq` is not sufficient to properly seed a random number generator like `mt19937`. It has more possible initial internal states than can be represented by an `int`. The purpose of `seed_seq` is to expand small amounts of randomness into a larger amount of pseudo-randomness. You should generate a distinct `seed_seq` for each generator. — François Andrieux, Nov 29 '17 at 22:09
@FrançoisAndrieux Okay, that makes sense. I included (hopefully) corrected code in my question, so as to not distract from the main thrust of the question. I believe that it should be correct now? Thanks! — AnInquiringMind, Nov 29 '17 at 22:18

score 5 · Accepted Answer · edited Jun 20 '20 at 09:12

std::seed_seq is generally intended to be used if you don't trust the default implementation to properly initialize the state of the engine you're using.

In many ≥C++11 implementations, std::default_random_engine is an alias for std::mt19937, which is a specific variant of the Mersenne Twister Pseudorandom Number Generation algorithm. Looking at the specification for std::mt19937, we see that it has a state of size 624 unsigned integers, which is enough to hold the 19937 bits of state it is intended to encompass (which is how it gets its name). Traditionally, if you seed it with only a single uint32_t value (which is what you would get from calling rd() once, if rd is a std::random_device object), then you're leaving the vast majority of its state uninitialized.

Now, the good news for anyone about to panic about their poorly-seeded Mersenne Twister engines is that if you construct a std::mt19937 with a single uint32_t value (like std::default_random_engine engine{rd()};), the implementation is required to initialize the rest of the state by permutating the original seed value, so while a single invocation of rd() yields a limited range of actual differing engine states, it's still sufficient to at least properly initialize the engine. This will yield a "Good Quality" random number generator.

But if you're worried about the engine not being properly seeded, either for cryptographic reasons (though note that std::mt19937 itself is NOT cryptographically secure!) or simply for statistical reasons, you can use a std::seed_seq to manually specify the entire state, using rd() to fill in each value, so that you can guarantee to a relative degree of confidence that the engine is properly seeded.

For casual use, or scenarios where it's not strictly necessary to achieve high quality random numbers, simply initializing with a single call to std::random_device::operator() is fine.

If you want to use a std::seed_seq, make sure you set it up correctly (the example in your original code is definitely not correct, at least for std::mt19937, and would actually yield much worse results than simply using rd()!). This post on CodeReview contains code which has been vetted properly.

Edit:

For the predefined templates of Mersenne Twister, the state size is always 19968 bits, which is slightly more than what it actually needs, but also the smallest value that can fully represent the range using uint32_t values. This works out to 624 Words of 32-bits each. So if you plan to use a Seed Sequence, you would correctly initialize it with 624 invocations to rd():

//Code copied from https://codereview.stackexchange.com/questions/109260/seed-stdmt19937-from-stdrandom-device
std::vector<uint32_t> random_data(624);
std::random_device source;
std::generate(random_data.begin(), random_data.end(), std::ref(source));
std::seed_seq seeds(random_data.begin(), random_data.end());
std::mt19937 engine(seeds);
//Or:
//std::mt19937_64 engine(seeds);

If you're working with a non-standard instantiation of std::mersenne_twister_engine, the state size needed for that specific situation can be queried by multiplying its state_size by its word_size and then dividing by 32.

using mt_engine = std::mersenne_twister_engine</*...*/>;
constexpr size_t state_size = mt_engine::state_size * mt_engine::word_size / 32;
std::vector<uint32_t> random_data(state_size);
std::random_device source;
std::generate(random_data.begin(), random_data.end(), std::ref(source));
std::seed_seq seeds(random_data.begin(), random_data.end());
mt_engine engine (seeds);

For other engine types, you'll need to evaluate them on a case-by-case basis. std::linear_congruential_engine and its predefined variants use a single integer of its word size, so they only require a single invocation of rd() to initialize, and thus Seed Sequences are unnecessary. I'm not sure how std::subtract_with_carry_engine or its associated-by-use std::discard_block_engine work, but it seems like they also only contain a single Word of state.

"you can use a std::seed_seq to manually specify the entire state, using rd() to fill in each value, so that you can guarantee to a relative degree of confidence that the engine is properly seeded." To specify the entire state, do I need to include 624 unsigned ints? Or will it simply initialize the rest of the state? Is specifying three (as in my example) okay, or should I specify more -- perhaps 5 or 10? I edited my question with hopefully corrected code -- has it been fixed? Thank you! — AnInquiringMind, Nov 29 '17 at 22:22
@PhysicsCodingEnthusiast [This is the reference for the `std::mt19937` constructor](http://en.cppreference.com/w/cpp/numeric/random/mersenne_twister_engine/mersenne_twister_engine), which includes details on how it behaves when you're not using a Seed Sequence. If you're going to use a `std::seed_seq`, you need to fill out every value for the entire range of the state you intend to specify (in this case, 19937 bits, or 624 `unsigned int`s), or you'll either leave 0's in the state (bad) or repeat those 3-5 numbers over and over (also bad). — Xirema, Nov 29 '17 at 22:26
@PhysicsCodingEnthusiast If you don't want to specify that whole range, then you either need to use a PRNG with a smaller state (`std::minstd_rand` comes to mind, which uses a single `uint32_t` as its state) or simply pass a single `uint32_t` as the seed instead of bothering with a Seed Sequence at all, which will at least be properly used to build a proper seed sequence in the constructor. — Xirema, Nov 29 '17 at 22:29
Thanks for the responses. Just curious: in the answers given here: https://stackoverflow.com/questions/34490599/c11-how-to-set-seed-using-random, they do not seem to specify the entire range of the state. Am I missing something, or are they simply doing a poor implementation? It would be really helpful if you could illustrate an efficient way to specify the entire range in your answer. Isn't `random_device` not good for generating many random numbers? — AnInquiringMind, Nov 29 '17 at 22:37
@user515430 I explained that in the succeeding paragraph. For someone insistent that I need to "Learn to Read", you really need to do a bit more of that yourself. — Xirema, Nov 30 '17 at 08:34
@Xirema Would you be able to answer the question posed in my previous comment (with the link to another SO question)? Those are my remaining questions regarding this matter. Thank you very much! — AnInquiringMind, Nov 30 '17 at 16:47
@PhysicsCodingEnthusiast I've updated the post with more information. — Xirema, Nov 30 '17 at 17:08
@Xirema Thanks for the additional information! So just to wrap up, I should view the answers given in https://stackoverflow.com/questions/34490599/c11-how-to-set-seed-using-random (in particular the answer by bames53) as unsatisfactory, given that they do not use 624 invocations of `rd()`? — AnInquiringMind, Nov 30 '17 at 17:18
@PhysicsCodingEnthusiast The accepted answer is fine, for the same reasons I already specified in my answer in the first section. The bames53 answer is definitely unacceptable because it defines a Seed Sequence that's too small for MT19937. — Xirema, Nov 30 '17 at 17:21
@Xirema There is one thing I'm still missing here. The code following "If you have multiple sources of randomness you can actually do this (e.g. 2)" in the accepted answer for that question uses only two invocations of random devices (`r1()` and `r2()`). Why is that okay there and not here? Is it assumed that those random devices yield more data/bits than `uint32_t`? (Sorry to keep asking questions, but I feel like I'm overlooking something.) Thanks! — AnInquiringMind, Nov 30 '17 at 17:27
@PhysicsCodingEnthusiast It's possible they're assuming that `std::default_random_engine` is an alias for one of the smaller engines like `std::minstd_rand` or `std::minstd_rand0` (which I think it is in GCC or some other environments), and thus a seed sequence with only one or two sources of randomness would be more than adequate. — Xirema, Nov 30 '17 at 17:31
@Xirema Ah, that would make sense. Thank you for all the info -- it's been quite illuminating! — AnInquiringMind, Nov 30 '17 at 17:35

Deciding between random_device and seed_seq to generate seeds for multiple random number sequences

Edit

1 Answers1

Edit:

Linked