OpenMP parallelisation of pi calculation is either slow or wrong

Question

I'm having trouble parallelising my monte carlo method to calculate pi. Here is the parallelised for-loop:

#pragma omp parallel for private(i,x,y) schedule(static) reduction(+:count)
  for (i = 0; i < points; i++) {
    x = rand()/(RAND_MAX+1.0)*2 - 1.0;
    y = rand()/(RAND_MAX+1.0)*2 - 1.0;

    // Check if point lies in circle
    if(x*x + y*y < 1.0) { count++; }
  }

The problem is, it underestimates pi if I use schedule(static), and its slower than the serial implementation if I use schedule(dynamic). What am I doing wrong? I've tried other ways to fix it (like this: Using OpenMP to calculate the value of PI) but it's still much slower than the serial implementation.

Thanks in advance

@Mystical: looks like it isn't: http://stackoverflow.com/questions/6161322/using-rand-with-multiple-threads-in-c — ev-br, Jan 24 '12 at 17:25
I suspect @Mysticial has the right idea. `rand` will normally have an internal "seed", which is basically acting as a shared resource, forcing serialization at every call (or else risking incorrect results). If you have it available, I'd try using `rand_r` or (preferably) `drand48_r` instead. Alternatively, consider the random number generation classes introduced in C++11 -- each instance has its own state, which should avoid serialization (but can make initialization tricky -- multiple threads creating identical sequences will do little good). — Jerry Coffin, Jan 24 '12 at 17:31

score 7 · Answer 1 · answered Jan 24 '12 at 17:30

Assuming you're using the C library rand function, that function is not reentrant or thread-safe. POSIX provides a rand_r function, but (to quote the glibc documentation):

POSIX.1 extended the C standard functions to support reproducible random numbers in multi-threaded programs. However, the extension is badly designed and unsuitable for serious work.

In particular, the seed must be an unsigned int, which doesn't have enough bits for a good PRNG. They recommend using the SVID random number functions, of which nrand48_r is probably what you're looking for.

Alternatively, you could use a different library.

How do I use nrand48_r? It doesn't appear to be in the standard library. I can use nrand48(&seed), where seed is a short unsigned integer. But I still get bad performance if I use a different seed for each thread — Eddy, Jan 25 '12 at 16:34

Stephan Dollberg · Answer 2 · 2012-01-24T18:00:29.583

1

One thing you have to take into account when doing such things in parallel, is that there might be different rounding errors, caused due to different ways of doing the calculation.

Example:

((A+B) + (C+D)) where (A+B) and (C+D) will be computed in parallel might vary from the serial approach (((A+B) + C) + D).

edited Jan 24 '12 at 18:00

answered Jan 24 '12 at 17:54

Stephan Dollberg

32,985
16
81
107

OpenMP parallelisation of pi calculation is either slow or wrong

2 Answers2

Linked