0

I have a program in .C that uses openmp that can be seen below; the program is used to compute pi given a set of steps; however, I am new to openMp, so my knowledge is limited.

I'm attempting to implement a barrier for this program, but I believe one is already implicit, so I'm not sure if I even need to implement it.

Thank you!

#include <omp.h>
#include <stdio.h>
#define NUM_THREADS 4
static long num_steps = 100000000;
double step;

int main()
{
    int i;
    double start_time, run_time, pi, sum[NUM_THREADS];
    omp_set_num_threads(NUM_THREADS);
    step = 1.0 / (double)num_steps;

    start_time = omp_get_wtime();

#pragma omp parallel 
    {
        int i, id, currentThread;
        double x;
        id = omp_get_thread_num();
        currentThread = omp_get_num_threads();
        for (i = id, sum[id] = 0.0; i < num_steps; i = i + currentThread)
        {
            x = (i + 0.5) * step;

            sum[id] = sum[id] + 4.0 / (1.0 + x * x);
        }
    }

    run_time = omp_get_wtime() - start_time;
    //we then get the value of pie 
    for (i = 0, pi = 0.0; i < NUM_THREADS; i++)
    {
        pi = pi + sum[i] * step;
    }
    printf("\n pi with %ld steps is %lf \n ", num_steps, pi);
    printf("run time = %6.6f seconds\n", run_time);
}
JasDj
  • 31
  • 4
  • Your code looks plausible. What's the problem? – Victor Eijkhout Oct 15 '22 at 22:37
  • 2
    Your code has a performance issue due to false sharing and is unnecessarily complicated. One of the main advantages of OpenMP is that there is no need for manual reduction. – Laci Oct 16 '22 at 00:41
  • 2
    There is an implicit barrier at the end of the section. As for the reduction mentioned by Laci, consider reading https://stackoverflow.com/questions/13290245/reduction-with-openmp for example. – Jérôme Richard Oct 16 '22 at 00:52

1 Answers1

1

In your case there is no need for an explicit barrier, there is an implicit barrier at the end of the parallel section.

Your code, however, has a performance issue. Different threads update adjacent elements of sum array which can cause false sharing:

When multiple threads access same cache line and at least one of them writes to it, it causes costly invalidation misses and upgrades.

To avoid it you have to be sure that each element of the sum array is located on a different cache line, but there is a simpler solution: to use OpenMP's reduction clause. Please check this example suggested by @JeromeRichard. Using reduction your code should be something like this:

    double sum=0;
    #pragma omp parallel for reduction(+:sum)
    for (int i = 0; i < num_steps; i++)
    {
        const double x = (i + 0.5) * step;
        sum += 4.0 / (1.0 + x * x);
    }

Note also that you should use your variables in their minimum required scope.

Laci
  • 2,738
  • 1
  • 13
  • 22