51

I've read that many pseudo-random number generators require many samples in ordered to be "warmed up". Is that the case when using std::random_device to seed std::mt19937, or can we expect that it's ready after construction? The code in question:

#include <random>
std::random_device rd;
std::mt19937 gen(rd());
David G
  • 94,763
  • 41
  • 167
  • 253
Brent
  • 4,153
  • 4
  • 30
  • 63
  • 8
    Where did you read that? I never heard of it, all I know is that they should be seeded... – PlasmaHH Mar 19 '13 at 20:02
  • 2
    For example, there is some discussion of it in this paper: http://www0.cs.ucl.ac.uk/staff/d.jones/GoodPracticeRNG.pdf – Brent Mar 22 '14 at 00:12
  • 2
    For the majority of PRNGs this makes no sense at all. Seeding sets the internal state, and any "warming up" permutates the internal state, as such it has the exact same effect had this new state been chosen as a seed. – PlasmaHH Mar 22 '14 at 09:50
  • FWIW many advise against `std::random_device` as it can throw at any time for all kinds of nonsense reasons. You could wrap it up in a try..catch block, but I'd recommend using a platform specific way to get a random number: on Windows, use the Crypto API, otherwise use `/dev/urandom/`. – BoltzmannBrain Dec 06 '18 at 16:50

3 Answers3

64

Mersenne Twister is a shift-register based pRNG (pseudo-random number generator) and is therefore subject to bad seeds with long runs of 0s or 1s that lead to relatively predictable results until the internal state is mixed up enough.

However the constructor which takes a single value uses a complicated function on that seed value which is designed to minimize the likelihood of producing such 'bad' states. There's a second way to initialize mt19937 where you directly set the internal state, via an object conforming to the SeedSequence concept. It's this second method of initialization where you may need to be concerned about choosing a 'good' state or doing warmup.


The standard includes an object conforming to the SeedSequence concept, called seed_seq. seed_seq takes an arbitrary number of input seed values, and then performs certain operations on these values in order to produce a sequence of different values suitable for directly setting the internal state of a pRNG.

Here's an example of loading up a seed sequence with enough random data to fill the entire std::mt19937 state:

std::array<int, 624> seed_data;
std::random_device r;
std::generate_n(seed_data.data(), seed_data.size(), std::ref(r));
std::seed_seq seq(std::begin(seed_data), std::end(seed_data));

std::mt19937 eng(seq);

This ensures that the entire state is randomized. Also, each engine specifies how much data it reads from the seed_sequence so you may want to read the docs to find that info for whatever engine you use.

Although here I load up the seed_seq entirely from std::random_device, seed_seq is specified such that just a few numbers that aren't particularly random should work well. For example:

std::seed_seq seq{1, 2, 3, 4, 5};
std::mt19937 eng(seq);

In the comments below Cubbi indicates that seed_seq works by performing a warmup sequence for you.

Here's what should be your 'default' for seeding:

std::random_device r;
std::seed_seq seed{r(), r(), r(), r(), r(), r(), r(), r()};
std::mt19937 rng(seed);
kevinarpe
  • 20,319
  • 26
  • 127
  • 154
bames53
  • 86,085
  • 15
  • 179
  • 244
  • 3
    Coincidentally, the default C++11 `seed_seq` *is* the Mersenne Twister warmup sequence (although the existing implementations, libc++'s `mt19937` for example, use a simpler warmup when a single-value seed is provided) – Cubbi Mar 19 '13 at 22:40
  • 1
    The `std::generate_n` generates SDL Warning C4996 on newer MSVC compilers so you can use `std::generate(seed_data.begin(), seed_data.end(), std::ref(r));` instead. – NightElfik Feb 18 '14 at 02:33
  • Is it safe to assume that a generator's `state_size` is always expressed in multiples of `sizeof(int)`? If a generator had a `state_size` measured in bytes, for example, the `seed_data` array would be too large. – Wyzard Mar 20 '14 at 05:41
  • 3
    @Wyzard Each engine class specifies exactly how many values it reads from a `seed_seq`. It just happens to be `state_size` for the class `std::mt19937`, but it's `2*std::mt19937_64::state_size` for `std::mt19937_64`, for example. You have to read the docs for the specific engine. It looks like the online references don't go into this detail, but it's covered in the standard. – bames53 Mar 20 '14 at 18:54
  • Hmm. I'm surprised it isn't defined as a constant in the class for use by generic code, but at least it's well-specified. – Wyzard Mar 21 '14 at 23:50
  • 1
    While standard conforming your answer can lead to bad results because you do not create the internal state directly, but instead use `std::seed_seq` which modifies the given values. For more info look [here](http://www.pcg-random.org/posts/cpp-seeding-surprises.html). Though guessing from your nick you already found that on reddit where you posted a nice alternative. – mfuchs Apr 21 '15 at 18:33
  • @mat69 Yes, I'm aware of /u/ProfONeill's work, agree with her criticisms in principle and look forward to her proposals to improve things. You can see her comment on this stackoverflow answer [here](http://www.reddit.com/r/cpp/comments/31857s/random_number_generation_it_might_be_harder_than/cq06qaq). Also note that the primary criticism of `seed_seq` in that article becomes of less and less practical significance as the input randomness increases. ProfONeil [says](http://www.reddit.com/r/cpp/comments/32u4m7/the_behavior_of_rng_seeding_in_c_may_surprise/cqexc14) 256 bits is probably enough. – bames53 Apr 21 '15 at 19:22
  • What about the default constructor for the engine, does it use a reasonable seed? – BeeOnRope May 04 '17 at 21:28
  • @BeeOnRope It's reasonable in the sense that it's not one of seeds that produces bad results, but its fixed to a specific value, so you always know what values a default constructed mt19937 engine will produce. – bames53 May 05 '17 at 01:31
  • Many advise against `std::random_device` as it can throw at any time for all kinds of nonsense reasons. You could wrap it up in a try..catch block, but I'd recommend using a platform specific way to get a random number: on Windows, use the Crypto API, otherwise use `/dev/urandom/`. – BoltzmannBrain Dec 06 '18 at 16:49
5

If you seed with just one 32-bit value, all you will ever get is one of the same 2^32 trajectories through state-space. If you use a PRNG with KiBs of state, then you should probably seed all of it. As described in the comments to @bames63' answer, using std::seed_seq is probably not a good idea if you want to init the whole state with random numbers. Sadly, std::random_device does not conform to the SeedSequence concept, but you can write a wrapper that does:

#include <random>
#include <iostream>
#include <algorithm>
#include <functional>

class random_device_wrapper {
    std::random_device *m_dev;
public:
    using result_type = std::random_device::result_type;
    explicit random_device_wrapper(std::random_device &dev) : m_dev(&dev) {}
    template <typename RandomAccessIterator>
    void generate(RandomAccessIterator first, RandomAccessIterator last) {
        std::generate(first, last, std::ref(*m_dev));
  }
};

int main() {

    auto rd = std::random_device{};
    auto seedseq = random_device_wrapper{rd};
    auto mt = std::mt19937{seedseq};
    for (auto i = 100; i; --i)
        std::cout << mt() << std::endl;

}

This works at least until you enable concepts. Depending on whether your compiler knows about SeedSequence as a C++20 concept, it may fail to work because we're supplying only the missing generate() method, nothing else. In duck-typed template programming, that code is sufficient, though, because the PRNG does not store the seed sequence object.

Marc Mutz - mmutz
  • 24,485
  • 12
  • 80
  • 90
2

I believe there are situations where MT can be seeded "poorly" which results in non-optimal sequences. If I remember correctly, seeding with all zeroes is one such case. I would recommend you try to use the WELL generators if this is a serious issue for you. I believe they are more flexible - the quality of the seed does not matter as much. (Perhaps to answer your question more directly: it's probably more efficient to focus on seeding well as opposed to seeding poorly then trying to generate a bunch of samples to get the generator to an optimal state.)

Mayur Patel
  • 945
  • 1
  • 7
  • 15