I am implementing a Monte Carlo simulation, where I need to run multiple realisations of some dynamics and then take an average over the end state for all the simulations. Since the number of realisation is large, I run them in parallel using OpenMP. Every realisation starts from the same initial conditions and then at each time step a process happens with a given probability, and to determine which process I draw a random number from a uniform distribution.
I want to make sure that all simulations are statistically independent and that there is no overlap in the random numbers that are being drawn.
I use OpenMP to parallelise the for loops, so the skeleton code looks like this:
vector<int> data(number_of_sims);
double t;
double r;
#pragma omp parallel for
for(int i = 0; i < number_of_sims; i++){
// run sim
t = 0;
while (t < T) {
r = draw_random_uniform();
if (r < p) do_something();
else do_something_else();
t += 1.0; // increment time
}
// some calculation
data[i] = calculate();
}
So every time I want a random number, I would call a function which used the Mersenne Twister seeded with random device.
double draw_random_uniform(){
static thread_local auto seed = std::random_device{}();
static thread_local mt19937 mt(seed);
std::uniform_real_distribution<double> distribution(0.0, 1.0);
double r = distribution(mt);
return r;
}
However, since I ultimately want to run this code on a high power computing cluster I want to avoid using std::random_device()
as it is risky for systems with little entropy.
So instead I want to create an initial random number generator and then jump it forward a large amount for each of the threads. I have been attempting to do this with the Xoroshiro256+ PRNG (I found some good implementation here: https://github.com/Reputeless/Xoshiro-cpp). Something like this for example:
XoshiroCpp::Xoshiro256Plus prng(42); // properly seeded prng
#pragma omp parallel num_threads()
{
static thread_local XoshiroCpp::Xoshiro256Plus lprng(prng); // thread local copy
lprng.longJump(); // jump ahead
// code as before, except use lprng to generate random numbers
# pragma omp for
....
}
However, I cannot get such an implementation to work. I suspect because of the double OpenMP for loops. I had the thought of pre-generating all of the PNRGs and storing in a container, then accessing the relevant one by using omp_get_thread_num()
inside the parallelised for loop.
I am unsure if this is the best way to go about doing all this. Any advice is appreciated.