As far as I can see, your professor is mistaken. The code is thread-safe. The reduction clause is written correctly.
The only issue I have is with the way you initialize the random number generators.
- It is inefficient calling into
random_device
once per thread
- It opens up the chance of a birthday problem. There is a (very minor) chance that two threads start with the same random state because they receive the same random value.
The more efficient way to create multiple random sequences is to increment the seed by 1 per thread. A good random number generator will convert this to vastly different random values. See for example here:
https://www.johndcook.com/blog/2016/01/29/random-number-generator-seed-mistakes/
Another minor issue is that you can move the reduction to the end of the parallel section. Then you can declare the for-loop as nowait to avoid one implicit openmp barrier at the end of the loop. This optimization is possible because you do not need the final sum inside the parallel section. However, this change is only an optimization. It does not change the correctness of your code.
The code would look something like this:
#include <cmath>
// using std::sqrt
#include <random>
// using std::random_device, std::mt19937, std::uniform_real_distribution
#include <iostream>
// using std::cout
#include <omp.h>
// using omp_get_thread_num
int main() {
const double a = 0.0, b = 1.0, c = 3.0, d = 5.0;
const int64_t N = 900000000;
double sum = 0.0;
std::random_device rd;
// initial seed
const std::random_device::result_type seed = rd();
# pragma omp parallel reduction(+ : sum)
{
std::uniform_real_distribution<double> dist(0.0, 1.0);
/*
* We add the thread number to the seed. This is better than using multiple
* random numbers because it avoids the birthday problem. See for example
* https://www.johndcook.com/blog/2016/01/29/random-number-generator-seed-mistakes/
*/
std::mt19937 gen(seed + static_cast<unsigned>(omp_get_thread_num()));
# pragma omp for nowait
for(int64_t i = 0; i < N; ++i) {
const double x = a + (b - a) * dist(gen);
const double y = c + (d - c) * dist(gen);
sum += std::sqrt(x + y);
}
}
const double ans = sum * (b - a) * (d - c)/ N;
std::cout << ans << '\n';
return 0;
}
Also note how you can indent the pragmas to improve readability. Just keep the # at the start of the line.
Relation to similar question
As Victor brought up, there is a similar question here:
Is mersenne twister thread safe for cpp
The question there was whether this code can be used thread-safe:
int f() {
std::random_device seeder;
std::mt19937 engine(seeder());
std::uniform_int_distribution<int> dist(1, 6);
return dist(engine);
}
The answer to that is yes, as pointed out in the answer there:
No C++ std type uses global data in a non-thread-safe way.
However, creating a new mt19937 just to compute a single integer is pointless. So the answer there suggests ways to reuse an mt19937 instance multiple times. But reuse requires thread-safety because
By default, one instance of a type cannot be accessed from two threads without synchronization.
The easiest answer there is to use a thread-local instance. We use a similar approach here by using one instance per thread in the parallel section.
There are a few key differences between my solution and the one proposed there:
- The other solution still suffers from the birthday problem. However, that issue is minor unless you create a ton of threads so it is forgivable (IMHO). Also, you can avoid this by using a different initialization strategy. For example this would work:
static std::atomic<std::default_random_engine::result_type> seed(
std::default_random_engine{}());
thread_local std::mt19937 rnd(
seed.fetch_add(1, std::memory_order_relaxed));
- The mt19937 there may live significantly longer and be reused more if you reuse the threads. This can be a benefit. But in the question here, the threads are not reused. Also, we use the RNG for a significant time before discarding it, so the benefit of using it even longer is less significant than in the other question.
- There may be hidden costs associated with thread-local data. An mt19937 has significant memory associated with it. I suggest not over-using thread-local storage if not required. Plus, like with any hidden global state, it is a bit of a code smell. Direct control over initialization may be preferential, e.g. for testing
Initializing Mersenne Twisters
I should also mention that initializing a mersenne twister with its huge internal state from just a single integer seed value may be less than ideal. C++'s solution of seed_seq
may also be flawed but it is the best we've got. So here is an outline of the code as I would write it:
std::array<std::mt19937::result_type, std::mt19937::state_size> seed;
std::random_device rdev;
std::generate(seed.begin(), seed.end(), std::ref(rdev));
# pragma omp parallel firstprivate(seed)
{
seed.front() += static_cast<unsigned>(omp_get_thread_num());
std::seed_seq sseq(seed.begin(), seed.end());
std::mt19937 rnd(sseq);
}
Use of MT19937 for Monte Carlo
I also need to mention this little note on Wikipedia
Multiple instances that differ only in seed value (but not other parameters) are not generally appropriate for Monte-Carlo simulations that require independent random number generators, though there exists a method for choosing multiple sets of parameter values.[44][45]
[44] Makoto Matsumoto; Takuji Nishimura. "Dynamic Creation of Pseudorandom Number Generators" (PDF). Retrieved 19 July 2015.
[45] Hiroshi Haramoto; Makoto Matsumoto; Takuji Nishimura; François Panneton; Pierre L'Ecuyer. "Efficient Jump Ahead for F2-Linear Random Number Generators" (PDF). Retrieved 12 Nov 2015