6

What is the best way to store the state of a C++11 random generator without using the iostream interface. I would like to do like the first alternative listed here[1]? However, this approach requires that the object contains the PRNG state and only the PRNG state. In partucular, it fails if the implementation uses the pimpl pattern(at least this is likely to crash the application when reloading the state instead of loading it with bad data), or there are more state variables associated with the PRNG object that does not have to do with the generated sequence.

The size of the object is implementation defined:

I am missing member functions like

  1. size_t state_size();
  2. const size_t* get_state() const;
  3. void set_state(size_t n_elems,const size_t* state_new);

(1) shall return the size of the random generator state array

(2) shall return a pointer to the state array. The pointer is managed by the PRNG.

(3) shall copy the buffer std::min(n_elems,state_size()) from the buffer pointed to by state_new

This kind of interface allows more flexible state manipulation. Or are there any PRNG:s whose state cannot be represented as an array of unsigned integers?

[1]Faster alternative than using streams to save boost random generator state

Community
  • 1
  • 1
user877329
  • 6,717
  • 8
  • 46
  • 88
  • You may find [this question](http://stackoverflow.com/questions/11563963/writing-a-binary-file-in-c-very-fast) helpful. Other than that, I don't think it's possible to serialize an RNG (or any object, really) without some knowledge about the underlying implementation. If it's at all possible, it will probably involve some... kinky hackery. – More Axes Jun 21 '14 at 15:57
  • @MoreAxes The mentioned question is unrelated. Also, this is not really a performance issue, but rather an interface compatibility issue: the I/O interface does not derive from any iostream class and I cannot use the supplied methods without first copying to stringstream, then convert back into binary, and finally write it using the ChunkIO::Writer::dataWrite function. – user877329 Jun 21 '14 at 16:09
  • 1
    `g++`'s and Ideone's `sizeof std::mt19937` returned values respectively 8 and 4 bytes greater than what is necessary to store the Mersenne Twister's state array. If this is a single value in either case, then I'd wager it's a pointer (I'm assuming you're on a 64-bit system and Ideone isn't), which you'd need treat accordingly during serialization. If it's a non-pointer value (or two of them in `g++`'s case), it may be safe to serialize it as-is. – More Axes Jun 21 '14 at 16:21
  • 3
    You could also try a different approach: make a wrapper class that stores the seed and the number of invocations, and store only those. During deserialization, seed the RNG with the stored seed, and invoke it that many times. This may be a bit wonky since I'm not sure if C++11's distributions always request the same number of pseudorandom bytes from the RNG they're called with, but it seems worth a try. It would obviously be rather slow during deserialization, but serialization would be blazing fast. – More Axes Jun 21 '14 at 16:23
  • @MoreAxes It is a size value: `_UIntType _M_x[state_size];size_t _M_p;` `_M_p` is the state size, which is written to the serialization as well. How long time does it take to generate 1e6 random numbers? – user877329 Jun 21 '14 at 17:16
  • According to Boost's [performace analysis of RNGs](http://www.boost.org/doc/libs/1_55_0/doc/html/boost_random/performance.html), a million random numbers should take about 5ms to generate, as implemented by Boost. I would expect the standard implementation to be no worse than that, but stranger things have happened. Where did you find this `_UIntType _M_x[state_size]; size_t _M_p;` business? – More Axes Jun 21 '14 at 17:22
  • @MoreAxes: random.h/random.tcc from the g++ c++ library. It do write _M_p when I use operator<< so with the current implementation, the only trouble I have compared to the standardized way is that on 32-bit I need to pad _M_p with four bytes to be compatible. – user877329 Jun 21 '14 at 17:44
  • @MoreAxes: What is the performance benchmark for converting to/from string representation? – user877329 Jun 21 '14 at 17:55
  • @MoreAxes: The `discard()` member function seems to be well suited for that task. – Kerrek SB Jun 21 '14 at 17:56
  • @KerrekSB Indeed it is. As for converting to and from string representations, you should use a binary representation when performace matters, but keep in mind endianness issues. – More Axes Jun 21 '14 at 17:58

1 Answers1

0

I've written a simple (-ish) test for the approach I mentioned in the comments of the OP. It's obviously not battle-tested, but the idea is represented - you should be able to take it from here.

Since the amount of bytes read is so much smaller than if one were to serialize the entire engine, the performance of the two approaches might actually be comparable. Testing this hypothesis, as well as further optimization, are left as an exercise for the reader.

#include <iostream>
#include <random>
#include <chrono>
#include <cstdint>
#include <fstream>

using namespace std;

struct rng_wrap
{
    // it would also be advisable to somehow
    // store what kind of RNG this is,
    // so we don't deserialize an mt19937
    // as a linear congruential or something,
    // but this example only covers mt19937

    uint64_t seed;
    uint64_t invoke_count;
    mt19937 rng;

    typedef mt19937::result_type result_type;

    rng_wrap(uint64_t _seed) :
        seed(_seed),
        invoke_count(0),
        rng(_seed)
    {}

    rng_wrap(istream& in) {
        in.read(reinterpret_cast<char*>(&seed), sizeof(seed));
        in.read(reinterpret_cast<char*>(&invoke_count), sizeof(invoke_count));
        rng = mt19937(seed);
        rng.discard(invoke_count);
    }

    void discard(unsigned long long z) {
        rng.discard(z);
        invoke_count += z;
    }

    result_type operator()() {
        ++invoke_count;
        return rng();
    }

    static constexpr result_type min() {
        return mt19937::min();
    }

    static constexpr result_type max() {
        return mt19937::max();
    }
};

ostream& operator<<(ostream& out, rng_wrap& wrap)
{
    out.write(reinterpret_cast<char*>(&(wrap.seed)), sizeof(wrap.seed));
    out.write(reinterpret_cast<char*>(&(wrap.invoke_count)), sizeof(wrap.invoke_count));
    return out;
}

istream& operator>>(istream& in, rng_wrap& wrap)
{
    wrap = rng_wrap(in);
    return in;
}

void test(rng_wrap& rngw, int count, bool quiet=false)
{
    uniform_int_distribution<int> integers(0, 9);
    uniform_real_distribution<double> doubles(0, 1);
    normal_distribution<double> stdnorm(0, 1);

    if (quiet) {
        for (int i = 0; i < count; ++i)
            integers(rngw);

        for (int i = 0; i < count; ++i)
            doubles(rngw);

        for (int i = 0; i < count; ++i)
            stdnorm(rngw);
    } else {
        cout << "Integers:\n";
        for (int i = 0; i < count; ++i)
            cout << integers(rngw) << " ";

        cout << "\n\nDoubles:\n";
        for (int i = 0; i < count; ++i)
            cout << doubles(rngw) << " ";

        cout << "\n\nNormal variates:\n";
        for (int i = 0; i < count; ++i)
            cout << stdnorm(rngw) << " ";
        cout << "\n\n\n";
    }
}


int main(int argc, char** argv)
{
    rng_wrap rngw(123456790ull);

    test(rngw, 10, true);  // this is just so we don't start with a "fresh" rng
    uint64_t seed1 = rngw.seed;
    uint64_t invoke_count1 = rngw.invoke_count;

    ofstream outfile("rng", ios::binary);
    outfile << rngw;
    outfile.close();

    cout << "Test 1:\n";
    test(rngw, 10);  // test 1

    ifstream infile("rng", ios::binary);
    infile >> rngw;
    infile.close();

    cout << "Test 2:\n";
    test(rngw, 10);  // test 2 - should be identical to 1

    return 0;
}
More Axes
  • 249
  • 3
  • 12