OpenMP and C++: private variables

Question

I am quite new with OpenMP and c++ and perhaps because of this I am having some really basic problems.

I am trying to do a static schedule with all variables being private (just in case, in order to verify that the result obtained is the same as the non-parallel one).

The problem arises when I see variables such as bodies which I do not know where they came from, as they are not previously defined.

Is it possible to define all the appearing variables, such as bodies, as private? How could that be done

  std::vector<phys_vector> forces(bodies.size());

  size_t i, j; double dist, f, alpha;


  #pragma omp parallel for schedule(static) private(i, j, dist, f, alpha)
  for (i=0; i<bodies.size(); ++i) {
    for (j = i+1; j<bodies.size(); ++j) {
      dist = distance(bodies[i], bodies[j]);
      if (dist > param.min_distance()) {
        f = attraction(bodies[i], bodies[j], param.gravity(), dist);
        alpha = angle(bodies[i],bodies[j]);
        phys_vector deltaf{ f * cos(alpha) , f * sin(alpha) };
        forces[i] += deltaf;
        forces[j] -= deltaf;
      }
    }
  }
  return forces;
}

PS: with the current code, the execution result varies from the non-parallel execution.

*The problem arises when I see variables such as `bodies` which I do not know where they came from, as they are not previously defined*. Such variables **must have been previously defined**, otherwise the compiler will complain. You're the master the code, so you must know where they come from too ... — Walter, Nov 19 '15 at 17:12
@Walter yes, but how do I define previously a variable that appears for the first time inside the loop, such as bodies? I know how this works in C as it is quite straightforward, but when it comes to C++ I am quite lost. — Alvaro Gomez, Nov 19 '15 at 17:19
So you have problems with basic c++ and think it's a good idea to do concurrent coding? That can't work. Understanding the single threaded behavior of your language is a requirement before you should even start thinking of using OpenMP. — Voo, Nov 19 '15 at 17:35
`bodies` doesn't just appear for the first time inside a loop. It is defined somewhere, either as a global variable, a member field, or as a function parameter. Figure out which it is so that you can then determine how to use it in the parallel loop. (I can tell you that `bodies` should be defined shared btw). — NoseKnowsAll, Nov 19 '15 at 18:03
the variable does not appear for the first time inside the loop as you claim. It must be declared before and even in the code you show it does appear before the loop. — 463035818_is_not_an_ai, Nov 19 '15 at 18:04

score 4 · Answer 1 · edited May 23 '17 at 12:22

NoseKnowsAll has correctly identified your problem.

I would like to explain more about why this problem happened. You could have done this with a square loop like this:

#pragma omp parallel for
for(int i=0; i<n; i++) {
    if(i==j) continue;
    phys_vector sum = 0;
    for(int j=0; j<n; j++) {
        //calculate deltaf
        sum += deltaf;
    }
    forces[i] = sum;
}

which uses n*(n-1) iterations and is easy to parallelize.

But since force(i,j) = -force(j,i) we can do this in half the iterations, n*(n-1)/2, using a triangular loop (which is what you have done):

for(int i=0; i<n; i++) {
    phys_vector sum = 0;
    for(int j=i+1; j<n; j++) {
        //calculate deltaf
        sum += deltaf;
        forces[j] -= deltaf;
    }
    forces[i] = sum;
}

The problem is when you do this optimization it makes it more difficult to parallelize the outer loop. There are two issues: writing to forces[j] and the iterations are no longer well distributed i.e. the first thread runs over more iterations than the last thread.

The easy solution is to parellelize the inner loop

#pragma omp parallel
for(int i=0; i<n; i++) {
    phys_vector sum = 0;
    #pragma omp for
    for(int j=i+1; j<n; j++) {
        //calculate deltaf
        sum += deltaf;
        forces[j] -= deltaf;
    }
    #pragma omp critical
    forces[i] += sum;
}

This uses n*nthreads critical operations out of a total of n*(n-1)/2 iterations. So the cost of the critical operations gets smaller as n gets larger. You could use a private forces vector for each thread and merge them in a critical section but I don't think this is necessary since the critical operations are on the outer loop and not the inner loop.

Here is a solution which fuses the triangular loop allowing each thread to run over the same number of iterations.

unsigned n = bodies.size();
unsigned r = n*(n-1)/2;
#pragma omp parallel
{
    std::vector<phys_vector> forces_local(bodies.size());
    #pragma omp for schedule(static)
    for(unsigned k=0; k<r; k++) {
        unsigned i  = (1 + sqrt(1.0+8.0*k))/2;
        unsigned j = i - k*(k-1)/2;
        //calculate deltaf
        forces_local[i] += deltaf;
        forces_local[j] -= deltaf;
    }
    #pragma omp critical
    for(unsigned i=0; i<n; i++) forces[i] += forcs_local[i];
}

I was unhappy with my previous method for fusing a triangle (because it needs to use floating point and the sqrt function) so I came up with a much simpler solution based on this answer.

This maps a triangle to a rectangle and visa-versa. First I convert to a rectangle with width n but with n*(n-1)/2 (same as the triangle). Then I calculate the (row,column) values of the rectangle and then to map to a triangle (which skips the diagonal) I using the following formula

//i is the row, j is the column of the rectangle
if(j<=i) {
    i = n - i - 2;
    j = n - j - 1;
}

Let's choose an example. Consider the following n=5 triangular loop pairs

(0,1), (0,2), (0,3), (0,4)
       (1,2), (1,3), (1,4)
              (2,3), (2,4)
                     (3,4)

mapping this to a rectangle becomes

(3,4), (0,1), (0,2), (0,3), (0,4)
(2,4), (2,3), (1,2), (1,3), (1,4)

Triangle loops with even values work the same way though it might not be as obvious. For example for n = 4.

(0,1), (0,2), (0,3)
       (1,2), (1,3)
              (2,3)

this becomes

(2,3), (0,1), (0,2), (0,3)
(1,2), (1,3)

This is not exactly a rectangle but the mapping works the same. I could have instead mapped it as

 (0,1), (0,2), (0,3)
 (2,3), (1,2), (1,3)

which is a rectangle but then I would need two formulas for odd and even triangle sizes.

Here is the new codes using the rectangle to triangle mapping.

unsigned n = bodies.size();
#pragma omp parallel
{
    std::vector<phys_vector> forces_local(bodies.size());
    #pragma omp for schedule(static)
    for(unsigned k=0; k<n*(n-1)/2; k++) {
        unsigned i = k/n;
        unsigned j = k%n;
        if(j<=i) {
            i = n - i - 2;
            j = n - j - 1;
        }
        //calculate deltaf
        forces_local[i] += deltaf;
        forces_local[j] -= deltaf;
    }
    #pragma omp critical
    for(unsigned i=0; i<n; i++) forces[i] += forcs_local[i];
}

score 3 · Accepted Answer · edited May 23 '17 at 12:14

It should be reiterated that your bodies variable does not just randomly appear out of nowhere; you should find out exactly where it is declared and what it is defined as. However, because you are only accessing elements of bodies and never changing them, this variable should be shared anyway, so is not your problem.

Your actual problem comes from the forces variable. You must ensure that different threads are not changing the variables forces[j] for the same j. If you follow the logic of your loop, you can be ensured that forces[i] is only accessed by the different threads, so there is no contention between them there. But forces[j] for the same j can very easily be modified by different iterations of your parallel i loop. What you need to do is reduce on your array by following one of the answers from that StackOverflow link.

OpenMP and C++: private variables

2 Answers2

Linked