1

I am using openmp to improve a monte carlo approach to find PI. What I did was to add the pragma clause to the sequential code. The code is as follows.

float host_monte_carlo_parallel(long trials, int noOfThreads) {
    float x, y;
    long points_in_circle;
    long i;

#pragma omp parallel for num_threads(noOfThreads) private(i, x, y) reduction(+:points_in_circle)
    for (i = 0; i < trials; i++) {
        x = rand() / (float) RAND_MAX;
        y = rand() / (float) RAND_MAX;
        //printf("%ld\n", i);
        points_in_circle += (x * x + y * y <= 1.0f);
    }

    return 4.0f * points_in_circle / trials;
}

The problem is that the sequential code runs far earlier than the parallel one. Am I using the pragma correct? The running times are approximately like this.

CPU pi calculated in 6.413644 s.
CPU parallel pi calculated in 203.746460 s.
  • Not sure if the problem is related with the calls to `rand()`. – Rajith Gun Hewage Jan 05 '16 at 05:03
  • 3
    Probably is related to `rand()` because `rand()` must maintain state information that's shared across all threads. Hence, only one thread at a time can call `rand()`. To test, replace `rand()` with `r++`, where `r` is just an `int` that you're incrementing. Try with `r` as a private variable, and as a shared variable. – user3386109 Jan 05 '16 at 05:26
  • @user3386109 I kind of didn't get how I can use this to measure performance against `rand()`. The elapsed times I got for the cases where `r` was private and shared were almost identical. – Rajith Gun Hewage Jan 05 '16 at 06:10
  • @HighPerformanceMark I used 4 threads which run on 4 physical cores. I timed the execution using `time.h`'s `clock()` function. – Rajith Gun Hewage Jan 05 '16 at 06:23
  • 2
    See http://stackoverflow.com/questions/10673732/openmp-time-and-clock-calculates-two-different-results – High Performance Mark Jan 05 '16 at 06:36
  • And on the `rand` issue, see http://stackoverflow.com/questions/10624755/openmp-program-is-slower-than-sequential-one/10625090 – High Performance Mark Jan 05 '16 at 12:13
  • using multiple CPUs or even multiple threads will result in the execution time taking longer, unless each CPU/thread is being blocked by I/O (or something similar) while the other CPUs/threads can continue. – user3629249 Jan 06 '16 at 09:45

0 Answers0