0

I am working with some legacy code I didn't write that generates random data. The output has changed after it was updated to 1.67 boost from 1.58. Normally reproducible output happens by a fixed seed key. But they are now different between new and old versions.

The boost random distributions used include uniform_int, uniform_real, exponential_distribution and normal_distribution. Does anyone have specific knowledge that one of those or more is now different wrt the boost versions I've mentioned?

I may have to write a simple test prog to ascertain this for sure.

learning2learn
  • 405
  • 5
  • 11
  • 4
    `I may have to write a simple test prog to ascertain this for sure.` - please do. Until than, VTC. – SergeyA May 30 '19 at 19:11
  • Without a [mcve] it is hard to impossible to help you. – Jesper Juhl May 30 '19 at 19:49
  • 1
    As an aside, if you're relying on a [pseudo-]random number generator to give you specific values, you're probably doing it wrong. It may seem useful to pick a single seed and then treat the resulting sequence as predictable, but (a) that's not sound in principle, and (b) as you've seen it's not even sound in practice. If you want a fixed sequence, define one! Treat PRNG output as "random", even though it's not (unless you actually need it to be lol) – Lightness Races in Orbit May 31 '19 at 11:47
  • The comment by @LightnessRacesinOrbit is incorrect: deterministic pseudo-random generators are both feasible and useful, with applications in scientific modeling (ensuring results can be reproduced), cryptography (stream ciphers) and games (map generation reproducible from a given seed). – dhardy Jul 03 '19 at 09:15
  • It appears that, specifically regarding the Boost distributions, one should not depend on their output. See https://github.com/boostorg/random/issues/56 – dhardy Jul 03 '19 at 10:52
  • @dhardy You slotted in the word "deterministic", and if you _know_ you have such a thing, then fine! Useful yes. But, as you've seen, that is not always the case and should not be relied on _in general_. Only when you have very specified guarantees can you consider using them in such a manner. The hint for intended usage is the term "random" in "pseudo-random". – Lightness Races in Orbit Jul 03 '19 at 12:21

2 Answers2

2

At least the normal and exponential distributions where changed to use (an improved version) of the Ziggurat method in July 2016, c.f. https://github.com/boostorg/random/commit/c7d1b4f3516098b3e2fc8f8531d716881ab5834e. This particular change first appeared in version 1.62 (released October 2016). I did not check further back in time.

Ralf Stubner
  • 26,263
  • 3
  • 40
  • 75
1

I had found the normal distribution was definitely different with a test program modeled from this other answer: Boost random number generator. Also looks like in typical usage of my utility, I think it is just using normal, but exponential would be triggered by a different option. Anyway, I confirmed differences for several thousand iterations on the two versions I first mentioned. Then after seeing the answer from Ralf Stubner (thank you), I did some more exhaustive testing and I see a confirmed difference at 1.64 wrt 1.57. After that the output is consistent again at least up to 1.67. I tried 1.57 through 1.67.

Compiled a test program like this: g++ -I /opt/boost_1_57_0 random_example.cc -O3 -o random_example Invoked like: random_example 0 0 9999999 > /tmp/random_example_boost_1_57_0_0_0_9999999.txt

# number of differences in ten million lines
root@ubuntu-02:/tmp# baseline=random_example_boost_1_57_0_0_0_9999999.txt
root@ubuntu-02:/tmp# test=random_example_boost_1_64_0_0_0_9999999.txt
root@ubuntu-02:/tmp# diff -U 0 $baseline $test | grep ^@ | wc -l 
5796
# look at first 5 lines of difference
root@ubuntu-02:/tmp# diff $baseline $test | head -5
261,262c261
< -36.8701
< -3.78609
---
> -38.8405
root@ubuntu-02:/tmp# 

The example code:

random_example.cc: 
#include <iostream>
#include "boost/random.hpp"
#include "boost/generator_iterator.hpp"
using namespace std;

int main(int argc, char **argv) {
    typedef boost::rand48 RNGType;
    int seed = 0;
    int start = 1;
    int stop = 100;
    if (argc>=2) {
        seed = atoi(argv[1]);
    }
    if (argc>=3) {
        start = atoi(argv[2]);
    }
    if (argc>=4) {
        stop = atoi(argv[3]);
    }
    RNGType rng(seed);
    typedef boost::normal_distribution<> dist_t;
    boost::normal_distribution<> distribution_params(0.0, 10.0);
    boost::variate_generator< RNGType, dist_t >
                dice(rng, distribution_params);
    for ( int i = start; i <= stop; i++ ) {
        double n  = dice();
        cout << n << endl;
    }
}
learning2learn
  • 405
  • 5
  • 11