1

I am trying to do a reduction on multiple variables (an array) using OMP, but wasn't sure how to implement it with OMP. See the code below.

#pramga omp parallel for reduction( ??? )
for (int i = 0; i < n; i++) {
        for (int j = 0; j < m; j++) {
                [ compute value ... ]

                y[j] += value
        }
}

I thought I could do something like this, with the atomic keyword, but realised this would prevent two threads from updating y at the same time even if they are updating different values.

#pramga omp parallel for
for (int i = 0; i < n; i++) {
        for (int j = 0; j < m; j++) {
                [ compute value ... ]

                #pragma omp atomic
                y[j] += value
        }
}

Does OMP have any functionality for something like this or otherwise how would I achieve this optimally without OMP's reduction keyword?

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
  • You could declare only the `j` loop to be `omp parallel`. If that's inefficient, for instance because the loop is too short, then try to exchange the two loops. – Victor Eijkhout Mar 06 '22 at 22:21

1 Answers1

1

There is an array reduction available in OpenMP since version 4.5:

#pramga omp parallel for reduction(+:y[:m])

where m is the size of the array. The only limitation here is that the local array used in reduction is always reserved on the stack, so it cannot be used in the case of large arrays.

The atomic operation you mentioned should work fine, but it may be less efficient than reduction. Of course, it depends on the actual circumstances (e.g. actual value of n and m, time to compute value, false sharing, etc.).

#pragma omp atomic
  y[j] += value
Laci
  • 2,738
  • 1
  • 13
  • 22
  • Ah... in my particular case y is dynamically allocated with its size determined at runtime. As you have suggested the atomic operation does work, but from my understanding it would hurt performance when unecessarily - two threads in theory could update y[i] and y[j] for different i and j, but the atomic operation would not enable them to. Is this correct? – DavieRodger Mar 06 '22 at 19:21
  • atomic operation always gives correct results, and allows update different y[i] and y[j]. The only performance related problem is that if they are in the same cache line, each memory write invalidates the cache line. It is called 'false sharing'. if array `y` is expected to be big the best is to do the reduction manually. – Laci Mar 06 '22 at 19:28
  • Please read [this](https://stackoverflow.com/questions/20413995/reducing-on-array-in-openmp) if you wish to implement manual array reduction in OpenMP. – Laci Mar 06 '22 at 19:32
  • Is computation of `value` is slow or fast? If it is fast, is it possible to swap `for` loops (do `for(int j=...)` first? – Laci Mar 06 '22 at 20:12
  • 1
    I think in your comment you confused `#pragma omp critical` and `#pragma omp atomic`. `#pragma omp critical` will not allow more threads to do something in parallel. – Laci Mar 06 '22 at 20:22