0

I am trying to learn parallelization with openmp.
This code performs two-dimensional Monte Carlo integration. Program execution(with omp) time is about 25 seconds for N = 900 million

#include <cmath>
#include <cstdint>
#include <iostream>
#include <omp.h>
#include <random>

double func(double x, double y) {
    return sqrt(x+y);
}

using std::cout;
using std::endl;


const double a = 0.0, b = 1.0, c = 3.0, d = 5.0;
const uint64_t N = 900000000; 

int main() {
    double sum = 0.0;
    std::random_device rd; 

#pragma omp parallel
    {
        uint32_t seed;

#pragma omp critical
        {
            seed = rd();
        }

        std::mt19937 gen(seed);
        std::uniform_real_distribution<double> dist(0.0, 1.0);
#pragma omp for reduction(+ : sum)
        for (int64_t i = 0; i < N; i++) { 
         
            double x = a + (b - a) * dist(gen);
            double y = c + (d - c) * dist(gen);
            
            
            sum += func(x, y);
            // sum - local for each thread
            // initialized to 0 in each thread
            // and will be added to the global sum when exiting the loop
            
        }
    }

    // The result is calculated after the parallel section

    double ans = sum * (b - a) * (d - c)/ N;
    cout << ans<<endl;
    return 0;
}

How can I convert this to thread-safe code (professor said that it is not thread-safe)?

DrRed_LZ
  • 25
  • 5
  • code is not thread safe meaning it can read/write shared resources(variables/files,etc) at same time. You need to make thread safe mean add some kind of locking like mutex while accessing shared resources. – Mayur Dec 21 '21 at 17:27
  • I think you need to add the `reduction` clause to the outer `omp parallel` pragma, and adjust the divisor of `ans` by the appropriate number of times that parallel block gets executed. – 1201ProgramAlarm Dec 21 '21 at 17:41
  • The problem is with the random number generator. It has global data, so your access to it is not safe. Fortunately, C++ has a `thread_local` keyword. So ```thread_local std::mt19937 gen(seed);``` will (or may) make the code thread safe. – Victor Eijkhout Dec 21 '21 at 18:03
  • @VictorEijkhout Read again. The generator is created inside the parallel section. It is already thread-local. – Homer512 Dec 21 '21 at 18:06
  • @Homer512 How do you know the generator does not use global or static variables? Declaring inside the parallel region is not enough to prevent that. – Victor Eijkhout Dec 21 '21 at 18:53
  • @VictorEijkhout C++11 standard section 17.6.5.9 Data race avoidance: "A C++ standard library function shall not directly or indirectly access objects (1.10) accessible by threads other than the current thread unless the objects are accessed directly or indirectly via the function’s arguments, including ```this```. […] This means, for example, that implementations can’t use a static object for internal purposes without synchronization because it could cause a data race even in programs that do not explicitly share objects between threads." – Homer512 Dec 21 '21 at 22:02
  • @Homer512 Thanks for the further explanations. The standard indeed seems to imply that the standard library these days is safe. – Victor Eijkhout Dec 21 '21 at 23:14

1 Answers1

2

As far as I can see, your professor is mistaken. The code is thread-safe. The reduction clause is written correctly.

The only issue I have is with the way you initialize the random number generators.

  1. It is inefficient calling into random_device once per thread
  2. It opens up the chance of a birthday problem. There is a (very minor) chance that two threads start with the same random state because they receive the same random value.

The more efficient way to create multiple random sequences is to increment the seed by 1 per thread. A good random number generator will convert this to vastly different random values. See for example here: https://www.johndcook.com/blog/2016/01/29/random-number-generator-seed-mistakes/

Another minor issue is that you can move the reduction to the end of the parallel section. Then you can declare the for-loop as nowait to avoid one implicit openmp barrier at the end of the loop. This optimization is possible because you do not need the final sum inside the parallel section. However, this change is only an optimization. It does not change the correctness of your code.

The code would look something like this:

#include <cmath>
// using std::sqrt
#include <random>
// using std::random_device, std::mt19937, std::uniform_real_distribution
#include <iostream>
// using std::cout
#include <omp.h>
// using omp_get_thread_num


int main() {
    const double a = 0.0, b = 1.0, c = 3.0, d = 5.0;
    const int64_t N = 900000000;
    double sum = 0.0;
    std::random_device rd;
    // initial seed
    const std::random_device::result_type seed = rd();
#   pragma omp parallel reduction(+ : sum)
    {
        std::uniform_real_distribution<double> dist(0.0, 1.0);
        /*
        * We add the thread number to the seed. This is better than using multiple
        * random numbers because it avoids the birthday problem. See for example
        * https://www.johndcook.com/blog/2016/01/29/random-number-generator-seed-mistakes/
        */
        std::mt19937 gen(seed + static_cast<unsigned>(omp_get_thread_num()));
#       pragma omp for nowait
        for(int64_t i = 0; i < N; ++i) { 
            const double x = a + (b - a) * dist(gen);
            const double y = c + (d - c) * dist(gen);
            sum += std::sqrt(x + y);
        }
    }
    const double ans = sum * (b - a) * (d - c)/ N;
    std::cout << ans << '\n';
    return 0;
}

Also note how you can indent the pragmas to improve readability. Just keep the # at the start of the line.

Relation to similar question

As Victor brought up, there is a similar question here: Is mersenne twister thread safe for cpp

The question there was whether this code can be used thread-safe:

int f() {
    std::random_device seeder;
    std::mt19937 engine(seeder());
    std::uniform_int_distribution<int> dist(1, 6);
    return dist(engine);
}

The answer to that is yes, as pointed out in the answer there:

No C++ std type uses global data in a non-thread-safe way.

However, creating a new mt19937 just to compute a single integer is pointless. So the answer there suggests ways to reuse an mt19937 instance multiple times. But reuse requires thread-safety because

By default, one instance of a type cannot be accessed from two threads without synchronization.

The easiest answer there is to use a thread-local instance. We use a similar approach here by using one instance per thread in the parallel section.

There are a few key differences between my solution and the one proposed there:

  • The other solution still suffers from the birthday problem. However, that issue is minor unless you create a ton of threads so it is forgivable (IMHO). Also, you can avoid this by using a different initialization strategy. For example this would work:
static std::atomic<std::default_random_engine::result_type> seed(
      std::default_random_engine{}());
thread_local std::mt19937 rnd(
      seed.fetch_add(1, std::memory_order_relaxed));
  • The mt19937 there may live significantly longer and be reused more if you reuse the threads. This can be a benefit. But in the question here, the threads are not reused. Also, we use the RNG for a significant time before discarding it, so the benefit of using it even longer is less significant than in the other question.
  • There may be hidden costs associated with thread-local data. An mt19937 has significant memory associated with it. I suggest not over-using thread-local storage if not required. Plus, like with any hidden global state, it is a bit of a code smell. Direct control over initialization may be preferential, e.g. for testing

Initializing Mersenne Twisters

I should also mention that initializing a mersenne twister with its huge internal state from just a single integer seed value may be less than ideal. C++'s solution of seed_seq may also be flawed but it is the best we've got. So here is an outline of the code as I would write it:

    std::array<std::mt19937::result_type, std::mt19937::state_size> seed;
    std::random_device rdev;
    std::generate(seed.begin(), seed.end(), std::ref(rdev));
#   pragma omp parallel firstprivate(seed)
    {
      seed.front() += static_cast<unsigned>(omp_get_thread_num());
      std::seed_seq sseq(seed.begin(), seed.end());
      std::mt19937 rnd(sseq);
    }

Use of MT19937 for Monte Carlo

I also need to mention this little note on Wikipedia

Multiple instances that differ only in seed value (but not other parameters) are not generally appropriate for Monte-Carlo simulations that require independent random number generators, though there exists a method for choosing multiple sets of parameter values.[44][45]

[44] Makoto Matsumoto; Takuji Nishimura. "Dynamic Creation of Pseudorandom Number Generators" (PDF). Retrieved 19 July 2015.

[45] Hiroshi Haramoto; Makoto Matsumoto; Takuji Nishimura; François Panneton; Pierre L'Ecuyer. "Efficient Jump Ahead for F2-Linear Random Number Generators" (PDF). Retrieved 12 Nov 2015

Homer512
  • 9,144
  • 2
  • 8
  • 25
  • Thank you for your help and advice. This will really improve the code, but I have a question related to this line: std::mt19937 gen(seed + static_cast(omp_get_thread_num())); Why do we add to seed this "static_cast(omp_get_thread_num())"? – DrRed_LZ Dec 21 '21 at 18:33
  • Reading up on this issue I get conflicting information. For instance: https://stackoverflow.com/questions/40655814/is-mersenne-twister-thread-safe-for-cpp. Can you guarantee that the mersenne twister does not use global variables? – Victor Eijkhout Dec 21 '21 at 18:55
  • @VictorEijkhout I've updated my answer. And yes, I can guarantee that the MT does not use global state. The answer to the other question is written in a somewhat confusing way, but it agrees with me on that point – Homer512 Dec 21 '21 at 21:55