5

I've just started to program with openmp and I'm trying to parallelize a for loop with a variable that I need out of the loop. Something like this:

float a = 0;
for (int i = 0; i < x; i++)
{
    int x = algorithm();
    /* Each loop, x have a different value*/
    a = a + x;
}
cout << a;

I think the variable a has to be a local variable for each thread. After those thread have ended their job, all the local variables a should be added into one final result.

How can I do that?

YSC
  • 38,212
  • 9
  • 96
  • 149
EndergirlPG
  • 61
  • 1
  • 2

3 Answers3

5

Use the #pragma omp parallel for reduction(+:a) clause before the for loop

variable declared within the for loop are local, as well as loop counters variable declared outside the #pragma omp parallel block are shared by default, unless otherwise specified (see shared, private, firstprivate clauses). Care should be taken when updating shared variables as a race condition may occur. In this case, the reduction(+:a) clause indicated that a is a shared variable on which an addition is performed at each loop. Threads will automatically keep track of the total amount to be added and safely increment a at the end of the loop.

Both codes below are equivalent:

float a = 0.0f;
int n=1000;
#pragma omp parallel shared(a) //spawn the threads
{
float acc=0;        // local accumulator to each thread
#pragma omp for     // iterations will be shared among the threads
for (int i = 0; i < n; i++){
      float x = algorithm(i); //do something
      acc += x;     //local accumulator increment
  } //for
#omp pragma atomic
a+=acc; //atomic global accumulator increment: done on thread at a time
} //end parallel region, back to a single thread
cout << a;

Is equivalent to:

float a = 0.0f;
int n=1000;
#pragma omp parallel for reduction(+:a)
for (int i = 0; i < n; i++){
    int x = algorithm(i);
    a += x;
    } //parallel for
cout << a;

Note that you can't make a for loop with a stop condition i<x where x is a local variable defined within the loop.

Brice
  • 1,560
  • 5
  • 10
4

There are many mechanisms how to achieve your goal, but the most simple is to employ OpenMP parallel reduction:

float a = 0.0f;
#pragma omp parallel for reduction(+:a)
for(int i = 0; i < x; i++) 
  a += algorithm();
cout << a;
Daniel Langr
  • 22,196
  • 3
  • 50
  • 93
  • OMG Thank you so much! And what happen if instead of 'a' being a float is an array? I mean, for example, if each time you do a loop, it modifies one position of the array, depending the position on the algorithm (in this case you can modify lots of time the same position) – EndergirlPG Nov 07 '18 at 10:16
  • 1
    Yes, it's possible since OpenMP 4.5. There have been many questions about this topic asked, see, e.g. [Reducing on array in OpenMP](https://stackoverflow.com/q/20413995/580083). – Daniel Langr Nov 07 '18 at 10:18
  • @EndergirlPG BTW in case of very large arrays, creating a _thread-local_ temporary array for each thread might not be possible. Then, you basically need to update the shared array (e.g., by using atomic updates). However, you should use some really clever (cache-blocked) access to this output array, mainly to prevent false sharing. Otherwise, the performance would be very low. – Daniel Langr Nov 07 '18 at 10:51
1

You can use the following structure to perform parallel reduction with thread-private containers since your update is scalar associative.

float a = 0;//Global and will be shared.
#pragma omp parallel 
{
    float y = 0;//Private to each thread
#pragma omp for
    for(int i = 0; i < x; i++)
         y += algorithm();//Better practice is to not use same variable as loop termination variable.
//Still inside parallel
#pragma omp atomic
    a += y;
 }
cout << a;
Tryer
  • 3,580
  • 1
  • 26
  • 49
  • That's exactly what OpenMP reduction is for. BTW `x` inside the OP's loop is not the termination variable, its a new local variable. – Daniel Langr Nov 07 '18 at 10:15
  • 1
    Agreed on the `x` part and modified my answer accordingly. I think for a beginner, breaking this down into smaller parts makes for better understanding of what is going on instead of using all clauses in one go. – Tryer Nov 07 '18 at 10:21