Using operator>> to seed mt19937

Question

In a blog post entitled "C++ Seeding Surprises," Melissa E. O'Neill reports that, "When std::seed_seq tries to “fix” high-quality seed data, it actually makes it worse." According O'Neill, a truly random seeding makes all states possible, but if you push such a seeding through std::seed_seq, it becomes less random, and certain states become unreachable through seeding.

So, if you have a good source of entropy, why not bypass seed_seq entirely?

That's what function seed_randomly() does below. It's taken from my rand_replacement repository on GitHub. It uses operator>> to overwrite all 624 state variables in mt19937.

template <typename ResultType>
class rand_replacement
{
public:
    using urbg_type = std::mt19937;
    using seed_type = typename std::mt19937::result_type;
private:
    urbg_type eng_{ seed_type{1u} };  // By default, rand() uses seed 1u.

    // ...

    void seed_randomly()
    {
        std::random_device rd;
        std::stringstream ss;
        for (auto i{ std::mt19937::state_size }; i--;)
            ss << rd() << ' ';
        ss >> eng_;
    }
};

Is this a novel and interesting idea, or is it really foolish?

Regarding std::stringstream: I understand that it is relatively slow, but that's okay. Seeding should be an infrequent operation.

Regarding std::random_device: I understand that random_device may be deterministic on some systems, may block on other systems, and also that it has a checkered history with minGW, but for now, at least, I am satisfied with it. My question is not about random_device; it is strictly focused on the idea of bypassing seed_seq using operator>>, a technique that could be used with any entropy source.

Are there any downsides?

By the way, the alternative, which uses seed_seq, is a tad bit more complex, and looks something like the following. Is it a better choice than what I coded above?

    void seed_randomly()
    {
        std::random_device rd;
        std::array<seed_type, std::mt19937::state_size> seeds;
        for (auto& s : seeds)
            s = rd();
        std::seed_seq const sseq{ std::cbegin(seeds), std::cend(seeds) };
        eng_.seed(sseq);
    }

How about using Sseq constructor? I mean, why making it complicated if you could make it simple, Occam etc? — Severin Pappadeux, Jul 10 '23 at 03:41
Thanks, Severin. 1. the default initializer for eng_ uses seed 1u so that I can mimic the behavoir of std::rand(). 2. seed_randomly() avoids std::seed_seq by design. According to the cited blog post by M.E. O'Neill, a truly random seeding makes all states possible, but if you push such a seeding through std::seed_seq, it becomes less random, and certain states become unreachable through seeding. — tbxfreeware, Jul 10 '23 at 05:05
I have edited the original post to include this explanation. — tbxfreeware, Jul 10 '23 at 05:49
I think Melissa is being overly pedantic in that article. If you're really concerned about the seed uniformly sampling every MT state then you'd be much better off starting with a better RNG. Assuming you're targeting a 64bit processor, I'd go with either PCG64 DXSM or xoshiro256** and just seed it directly. — Sam Mason, Jul 10 '23 at 09:51
Thanks, Sam. Funny you should say. I'm in the middle of a deep dive into PCG, having completely refactored/rewritten O'Neill's code! Is the following a fair characterization of your thinking? If forced to use `mt19937`, you would be content with the `seed_seq` alternative I added above, but you do not see anything particularly wrong with the version that uses `operator>>`. — tbxfreeware, Jul 10 '23 at 15:55
`std::seed_seq` does lots of extra stuff to the values under the assumption they're biased. given the definition of `random_device`, this seems like the wrong thing to be doing. think I'd just implement `generate` using a `random_device` directly. something like https://godbolt.org/z/K4arc1M98 — Sam Mason, Jul 11 '23 at 15:23
@SamMason exactly what I was proposing, sorry for being unclear. Simple class implementing SSeq interface, like generate, but just being a holder/passthrough of random_device outcome. Standard std::seed_seq is doing a lots of bits shuffling which you don't need - you use random_device — Severin Pappadeux, Jul 11 '23 at 18:08
@SeverinPappadeux didn't realise this is what you meant! have tried to turn this into an answer — Sam Mason, Jul 11 '23 at 21:17
Thanks, guys. This is perfect. Avoids both the overhead of `std::stringstream` and the bit-twiddling of `std::seed_seq`. I'm going to investigate what other parts of `seed_seq` need to be implemented in faux versions in order to conform to the definition in the C++ standard. If they are not overly cumbersome, I'll throw them in when I post an official answer to this thread. The idea is to avoid conflicting with any future concept that might block entrance to member function `seed(sseq)` in a standard-conforming random number engine. — tbxfreeware, Jul 11 '23 at 21:34
yep, best choice. I would modify @SamMason code to hold array inside your sseq (or whatever you called it), such that you could get proper size() and param() implementations. generate() then would be just std::copy() — Severin Pappadeux, Jul 11 '23 at 22:15

score 4 · Answer 1 · answered Jul 11 '23 at 21:14

As alluded to at the end of the article it makes sense to bypass std::seed_seq but using operator>> doesn't seem like a great way of going about it. Providing an alternate implementation of a SeedSequence allows the MT's state to be populated directly from a std::random_device.

Something like:

#include <random>

struct rd_seed {
    using result_type = std::random_device::result_type;
    template< class RandomIt >
    void generate( RandomIt begin, RandomIt end ) {
        for ( std::random_device rd; begin != end; begin++ )
            *begin = rd();
    }
};

void seed(std::mt19937 &rng) {
    rd_seed seed;
    rng.seed(seed);
}

Melissa also suggested that it would be better if something like random_device provided a generate() method like this directly rather than having to make many calls into the OS to collect state 32bits at a time.

Yep, here are cookie points. I would modify code to hold array inside your rd_seed, such that you could get proper size() and param() implementations. generate() then would be just std::copy(). — Severin Pappadeux, Jul 11 '23 at 22:17
Thanks to @Sam Mason and @Severin Pappadeux for helping out with this. Indeed, bypassing `std::seed_seq` was a good idea, and those guys nailed the best way to do it. — tbxfreeware, Jul 12 '23 at 00:36

tbxfreeware · Accepted Answer · 2023-07-29T03:02:19.780

class seed_seq_rd – mimics the complete interface of seed_seq

The conclusion reached here is that it is wise to bypass std::seed_seq when you have a truly random source of seeding data.

On many systems, but perhaps not all, std::random_device qualifies as such a source. Its potential pitfalls are well known. This answer assumes that std::random_device is a reliable source for random seeds.

A further conclusion is that the solution using operator>> given in the question is suboptimal. That solution works fine as it stands, but the overhead of std::stringstream slows things down unnecessarily. A better solution is to create a custom seed_seq that generates seeds directly, without the need to serialize them, and push them through std::stringstream.

Based on the ideas of @Sam Mason and @Severin Pappadeux, I came up with class tbx::seed_seq_rd, which implements the complete interface of std::seed_seq. It performs only basic checks of its template arguments. Other than that, it complies with all requirements of a seed sequence as defined in the C++ standard.

The reason for implementing the complete interface is so that seed_seq_rd will satisfy whatever concepts or SFINAE may be blocking entrance to function seed in a standard-conforming random number engine.

Using it is simple.

// Example: Seed mt19937 with random seeds from std::random_device.
std::mt19937 mt;
tbx::seed_seq_rd s;
mt.seed( s );

// Example: Seed pcg32, one of the PCG engines by Melissa O'Neill.
pcg32 e;
e.seed( s );  // seed_seq_rd object can be reused.

Function seed_randomly, from my original question, is now templated, and works with any random number engine in the C++ Standard Library. It also works with PCG, by Melissa O'Neill, and any other random number engine that can be seeded with a seed sequence.

template< typename RandomEngine >
void seed_randomly( RandomEngine& e ) {
    tbx::seed_seq_rd s;
    e.seed( s );
}

// Example: Seed mt19937 with random seeds from std::random_device.
std::mt19937 mt;
tbx::seed_randomly( mt );

// Example: Seed pcg32, one of the PCG engines by Melissa O'Neill.
pcg32 e;
tbx::seed_randomly( e );

I tested with MSVC, and was able to seed all of the engines from the standard library, as well as pcg32, a PCG engine by Melissa O'Neill.

I put some polish on seed_seq_rd, so that it is suitable as a library routine, and uploaded the source code to GitHub.

Souce code for a short demo program is also on GitHub. The demo is a complete program, so you should be able to download and compile without much need to fiddle. I had my compiler set to C++14.

Using operator>> to seed mt19937

2 Answers2

class seed_seq_rd – mimics the complete interface of seed_seq

Linked