openmp code to find PI takes more time than the serial method

Question

I am using openmp to improve a monte carlo approach to find PI. What I did was to add the pragma clause to the sequential code. The code is as follows.

float host_monte_carlo_parallel(long trials, int noOfThreads) {
    float x, y;
    long points_in_circle;
    long i;

#pragma omp parallel for num_threads(noOfThreads) private(i, x, y) reduction(+:points_in_circle)
    for (i = 0; i < trials; i++) {
        x = rand() / (float) RAND_MAX;
        y = rand() / (float) RAND_MAX;
        //printf("%ld\n", i);
        points_in_circle += (x * x + y * y <= 1.0f);
    }

    return 4.0f * points_in_circle / trials;
}

The problem is that the sequential code runs far earlier than the parallel one. Am I using the pragma correct? The running times are approximately like this.

CPU pi calculated in 6.413644 s.
CPU parallel pi calculated in 203.746460 s.

Not sure if the problem is related with the calls to `rand()`. — Rajith Gun Hewage, Jan 05 '16 at 05:03
Probably is related to `rand()` because `rand()` must maintain state information that's shared across all threads. Hence, only one thread at a time can call `rand()`. To test, replace `rand()` with `r++`, where `r` is just an `int` that you're incrementing. Try with `r` as a private variable, and as a shared variable. — user3386109, Jan 05 '16 at 05:26
@user3386109 I kind of didn't get how I can use this to measure performance against `rand()`. The elapsed times I got for the cases where `r` was private and shared were almost identical. — Rajith Gun Hewage, Jan 05 '16 at 06:10
@HighPerformanceMark I used 4 threads which run on 4 physical cores. I timed the execution using `time.h`'s `clock()` function. — Rajith Gun Hewage, Jan 05 '16 at 06:23
See http://stackoverflow.com/questions/10673732/openmp-time-and-clock-calculates-two-different-results — High Performance Mark, Jan 05 '16 at 06:36
And on the `rand` issue, see http://stackoverflow.com/questions/10624755/openmp-program-is-slower-than-sequential-one/10625090 — High Performance Mark, Jan 05 '16 at 12:13
using multiple CPUs or even multiple threads will result in the execution time taking longer, unless each CPU/thread is being blocked by I/O (or something similar) while the other CPUs/threads can continue. — user3629249, Jan 06 '16 at 09:45

openmp code to find PI takes more time than the serial method

0 Answers0