7

I'm using the new random number generators in in C++11. Although there are varying opinions, from this thread it seems that the majority believe they are not thread safe. As a consequence, I would like to make a program, where each thread uses its own RNG.

An example is given in the related discussion of how to accomplish this with OpenMP:

#include <random>
#include <iostream>
#include <time.h>
#include "omp.h"

using namespace std;



int main()
{
    unsigned long long app = 0;
    {
        //mt19937_64 engine((omp_get_thread_num() + 1)); //USE FOR MULTITHREADING
        mt19937_64 engine; //USE FOR SINGLE THREAD
        uniform_real_distribution<double> zeroToOne(0.0, 1.0);

        //#pragma omp parallel for reduction(+:app) //USE FOR MULTITHREADING
        for (unsigned long long i = 0; i < 2000000000; i++)
        {
            if(zeroToOne(engine) < 0.5) app++;
        }
    }
    cout << app << endl;
    return 0;
}

When I run the multi-threaded and single-threaded version of this program and keep track of the time, they take the same amount of time to finish after execution. Also, app does not have the same size in the two cases, but I suspect that is merely because of the different seeds.

Question: Does the provided example correctly show how to force each thread to use its own RNG? If not, can I see an example of how this is done, or get a reference to some place where they explain how to achieve this?

Community
  • 1
  • 1
BillyJean
  • 1,537
  • 1
  • 22
  • 39

2 Answers2

6

You must no share instances of random engine between multiple threads. You should either lock a single engine or create one engine for each thread (with different seed (please note the answer of e4e5f4 regarding creation of parallel MT engines)). In case of OpenMP you can easily store one engine per thread in a vector and retrieve it by result of omp_get_thread_num() which lies between 0 and omp_get_num_threads()–1.

class RNG
{
public:
    typedef std::mt19937 Engine;
    typedef std::uniform_real_distribution<double> Distribution;

    RNG() : engines(), distribution(0.0, 1.0)
    {
        int threads = std::max(1, omp_get_max_threads());
        for(int seed = 0; seed < threads; ++seed)
        {
            engines.push_back(Engine(seed));
        }
    }

    double operator()()
    {
        int id = omp_get_thread_num();
        return distribution(engines[id]);
    }

    std::vector<Engine> engines;
    Distribution distribution;
};

int main()
{
     RNG rand;
     unsigned long app = 0;

     #pragma omp parallel for reduction(+:app)
     for (unsigned long long i = 0; i < 2000000000; i++)
     {
         if(rand() < 0.5) app++;
     }
}
hansmaad
  • 18,417
  • 9
  • 53
  • 94
  • Thank you for this example, which is very helpful. I have two questions: (1) Can I ask why you have chosen to include `: engines()`? Strictly speaking, is that required? .... (2) Am I allowed to use the object `rand` in a later loop in my program, which is not parallelized? – BillyJean Apr 10 '13 at 11:49
  • 1
    @BillyJean (1) Not required but my personal style to call each element ctor in initializer list if at least one is called. (2) Not 100% sure, but I think `omp_get_thread_num()` returns 0 for non parallelized region, so Yes. – hansmaad Apr 10 '13 at 12:02
  • A final question: Say I make `RNG` global and its object `rand` global too. Instead of the conditional `(rand() < 0.5)` I now call a global function `func`, which performs some calculations that depend on `rand`. Will the use of `rand` in `func` still be thread safe? I would say `yes`, but I would like to hear your professional opinion too. – BillyJean Apr 10 '13 at 13:12
  • 1
    It is safe. Please note that some distribution implementations (normal_distribution) may store state and thus must not be shared either! – hansmaad Apr 10 '13 at 13:20
  • Thanks for the heads up: But that is what I like about your example, I can readily extend it to any distribution offered by the library... – BillyJean Apr 10 '13 at 13:22
  • After thinking about your method, doesn't this require some sort of lock, as in adding the keyword `critical`..? It is the same problem as this poster has: http://stackoverflow.com/questions/6297345/alternative-to-mutex-lock-unlock-when-accessing-class-data. If a lock has to be implemented, it will turn out to be very inefficient – BillyJean Apr 11 '13 at 12:40
  • 1
    No lock required. operator() does not modify any shared resources. It reads from a vector which is always safe if there is no parallel write to the same vector. All other resources(engine retrieved from a vector & optinally a distribution retrieved from a vector) are not shared. – hansmaad Apr 11 '13 at 13:35
  • This accepted solution is racy, it gives a different result each time it is run and the execution time varies wildly. Creating a vector of Distribution objects (as well as engines) in RNG solves this problem. I guess Distribution has a state. – Yian Pap Dec 11 '20 at 10:24
2

I would refrain from using random seeding. It might end up with overlapping streams. This will eventually affect the final statistic.

I would suggest some tried and tested solution like this

Nishanth
  • 6,932
  • 5
  • 26
  • 38
  • 2
    Your "tried and tested solution like this" gives warning: "Caution II: this is not tested well yet, so it may contain many bugs." :-) – hansmaad Apr 10 '13 at 11:19
  • I would rather trust MTDC than random seeding :) – Nishanth Apr 10 '13 at 12:10
  • I just smiled when I read this note on the website. Nevertheless the document is very interesting. When I implemented parallelized PRNG the first time I wondered how to seed the engines, but I could not found any information. Indeed, we never had (obvious) problems in Monte Carlo simulations using a simple random seeding. – hansmaad Apr 10 '13 at 12:26