Reduction with OpenMP

Question

I am trying to compute mean of a 2d matrix using openmp. This 2d matrix is actually an image.

I am doing the thread-wise division of data. For example, if I have N threads than I process Rows/N number of rows with thread0, and so on.

My question is: Can I use the openmp reduction clause with "#pragma omp parallel"?

#pragma omp parallel reduction( + : sum )
{
    if( thread == 0 )
       bla bla code 
       sum = sum + val;

    else if( thread == 1 )
       bla bla code
       sum = sum + val;
}

The short answer is YES! Even though most of the tutorials only show the example of `reduction` append on the `parallel for` clause. But `reduction` on the `parallel` clause is also correct and some times useful. — Xin Cheng, Apr 08 '19 at 14:40

Hristo Iliev · Answer 1 · 2015-06-18T11:14:31.940

34

Yes, you can - the reduction clause is applicable to the whole parallel region as well as to individual for worksharing constructs. This allows for e.g. reduction over computations done in different parallel sections (the preferred way to restructure the code):

#pragma omp parallel sections private(val) reduction(+:sum)
{
   #pragma omp section
   {
      bla bla code
      sum += val;
   }
   #pragma omp section
   {
      bla bla code
      sum += val;
   }
}

You can also use the OpenMP for worksharing construct to automatically distribute the loop iterations among the threads in the team instead of reimplementing it using sections:

#pragma omp parallel for private(val) reduction(+:sum)
for (row = 0; row < Rows; row++)
{
   bla bla code
   sum += val;
}

Note that reduction variables are private and their intermediate values (i.e. the value they hold before the reduction at the end of the parallel region) are only partial and not very useful. For example the following serial loop cannot be (easily?) transformed to a parallel one with reduction operation:

for (row = 0; row < Rows; row++)
{
   bla bla code
   sum += val;
   if (sum > threshold)
      yada yada code
}

Here the yada yada code should be executed in each iteration once the accumulated value of sum has passed the value of threshold. When the loop is run in parallel, the private values of sum might never reach threshold, even if their sum does.

edited Jun 18 '15 at 11:14

answered Nov 08 '12 at 14:13

Hristo Iliev

72,659
12
135
186

If he calls ordered with that kind o distribution he loose most of the parallelism. – dreamcrash Nov 08 '12 at 16:25
1

@dreamcrash, if implemented correctly, ordered execution might not kill most of the parallelism - see [this answer](http://stackoverflow.com/a/13230816/1374437). – Hristo Iliev Nov 08 '12 at 16:31
Exactly, we must not use the static for with the default chunk size. – dreamcrash Nov 08 '12 at 16:40
In the second case if instead of a reduction on a sum, e.g with `reduction(+:...`, it was to find the minimum or maximum, e.g with `reduction(min:...`, one could do it manually using double-checked locking and it work work fine. – Z boson May 08 '14 at 07:06
@Miszy, the edit reason you've specified is invalid. Neither `shared(Rows)` nor `private(row)` is _required_ by the OpenMP standard. `row` is the loop counter and as such has predetermined data sharing class of `private`. `Rows` is declared outside the parallel region and as such has a predetermined sharing class of `shared`. – Hristo Iliev Jun 16 '15 at 08:42
It's not required _per se_ but it's a matter of a good style to add them. If you look at any of OpenMP examples (including the one for this reduction clause) they all have explicit `shared` and `private` – Michał Miszczyszyn Jun 16 '15 at 08:45
@Miszy Would you start adding `default(none)` to all StackOverflow code just because it is "good style"? – Vladimir F Героям слава Jun 17 '15 at 15:22
@VladimirF, I added the `default(none)` clause because listing all variables in the respective clauses makes sense only if done in combination with `default(none)`. – Hristo Iliev Jun 17 '15 at 19:24
@HristoIliev It was just to point out that the original edit didn't make sense. I actually didn't notice you added that, I thought you reverted the edit, I was only looking to the history. – Vladimir F Героям слава Jun 17 '15 at 19:33

score 0 · Answer 2 · answered May 07 '14 at 20:14

0

In your case, the sum = sum + val could interpreted as val[i] = val[i-1] + val[i] in 1-d array (or val[rows][cols] = val[rows][cols-1] + val[rows][cols] in 2-d array) which is a prefix sum calculation.

Reduction is one of solution for prefix sum, you can use reduction to any commutative-associative operators like '+', '-', '*', '/'.

answered May 07 '14 at 20:14

Charles Chow

1,027
12
26

4

How is `-` and `/` commutative-associative? – Shreyash S Sarnayak May 09 '17 at 05:15
1 - 2 = 1 + (-2) = (-2) + 1.... 1/5 = 1 * (0.2) = (0.2) * 1 – DarkCygnus Mar 17 '22 at 18:26

Reduction with OpenMP

2 Answers2

Linked