0

How can i parallelize this code using openmp: xp, yp, zp, gpx, gpy, and gpz are known 1D vectors.

for (ies = 0; ies < 1000000; ies++){
    for (jes = ies+1; jes < 1000000; jes++){

        double dxp = xp[ies] - xp[jes];
        double dyp = yp[ies] - yp[jes];
        double dzp = zp[ies] - zp[jes];
        double distance = sqrt( dxp * dxp + dyp * dyp + dzp * dzp );
        double gpspec = gpx[ies] * gpx[jes] + gpy[ies] * gpy[jes] + gpz[ies] * gpz[jes];
        #pragma omp parallel for
        for (kes = 1; kes <= 100; kes++){

            double distan = kes * distance;
            E1[kes] = E1[kes] + gpspec * sin(distan) / distan;
        }
    }
}
cigien
  • 57,834
  • 11
  • 73
  • 112
Luc Nguyen
  • 101
  • 9

2 Answers2

1

Here is a possibility (not tested)

#pragma omp parallel for reduction(+: E1) private(jes, kes) schedule(dynamic)
for (ies = 0; ies < 1000000; ies++){
    for (jes = ies+1; jes < 1000000; jes++){

        double dxp = xp[ies] - xp[jes];
        double dyp = yp[ies] - yp[jes];
        double dzp = zp[ies] - zp[jes];
        double distance = sqrt( dxp * dxp + dyp * dyp + dzp * dzp );
        double gpspec = gpx[ies] * gpx[jes] + gpy[ies] * gpy[jes] + gpz[ies] * gpz[jes];
        for (kes = 1; kes <= 100; kes++){

            double distan = kes * distance;
            E1[kes] = E1[kes] + gpspec * sin(distan) / distan;
        }
    }
}

I've put a schedule(dynamic) to try to compensate the workload imbalance between threads introduced by the triangular aspect of the index domain ies * jes that the loops cover. Also depending on the way E1 is defined, this may or may not be accepted by your compiler. But in any case, if the reduction(+: E1) isn't accepted, there's always the possibility to do the reduction by hand with a critical construct.

Gilles
  • 9,269
  • 4
  • 34
  • 53
  • Thank you very much for your suggestions. I tested the code using '#pragma omp parallel for reduction(+: E1) private(ies, jes, kes)', and the calculation time reduces significantly. However, using '#pragma omp parallel for reduction(+: E1[:]) private(jes, kes) schedule(dynamic)', the speed of calculation is better, and it can load whole cores. 'reduction(+: E1[:])' is accepted by the compiler. – Luc Nguyen Oct 15 '20 at 03:13
0

You already have an omp parallel for pragma on the innermost loop. To give that effect, you probably need to enable OpenMP support in your compiler by setting a compiler flag (for example, with the GCC compiler suite, that would be the -fopenmp flag). You may also need to #include the omp.h header.

But with that said, I doubt you're going to gain much from this parallelization, because one run of the loop you are parallelizing just doesn't do much work. There is runtime overhead associated with parallelization that offsets the gains from running multiple loop iterations at the same time, so I don't think you're going to net very much.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • I knew how to complier code and use -fopenmp, #include. I parallelize only the "kes" loop. However, I think that it is not effective. I wish to parallelize "ies" loop or "jes" loop with "kes" loop. Because size of "ies" loop = size of "jes" loop = 1000 000. – Luc Nguyen Oct 13 '20 at 14:59
  • 1
    @LucNguyen, if that's what you wanted to know about then you should have said so in the question. For that case, you probably want an [array reduction](https://stackoverflow.com/a/20421200/2402272) to overcome the data races that you otherwise would have if you broadened the parallelization scope. – John Bollinger Oct 13 '20 at 15:55