22

Does OpenMP natively support reduction of a variable that represents an array?

This would work something like the following...

float* a = (float*) calloc(4*sizeof(float));
omp_set_num_threads(13);
#pragma omp parallel reduction(+:a)
for(i=0;i<4;i++){
   a[i] += 1;  // Thread-local copy of a incremented by something interesting
}
// a now contains [13 13 13 13]

Ideally, there would be something similar for an omp parallel for, and if you have a large enough number of threads for it to make sense, the accumulation would happen via binary tree.

Andrew Wagner
  • 22,677
  • 21
  • 86
  • 100
  • 1
    May be you could explain a bit more what you want to do exactly. Providing serial code might help. – FFox Sep 27 '10 at 05:49
  • Digging around a bit more, it sounds like "only in fortran" is the answer. I ended up just allocating a single large array of local copies outside of the loop, letting the threads accumulate to their own copies within the for loop, then accumulating into a global array after the for loop, still inside the parallel region, inside of a critical section. – Andrew Wagner Sep 27 '10 at 19:52
  • 1
    Digging even more, here is a research paper on something similar, but it's not in openmp yet. http://www.springerlink.com/content/tq76655852630525/ – Andrew Wagner Oct 01 '10 at 14:05
  • You can probably use atomic rather than critical to guard the individual adds (or even an array of locks) if you want to reduce the overhead; you could even use an array of shared arrays rather than private arrays and try to roll your own binary reduction. But it'll be ugly. – Jonathan Dursi Oct 22 '10 at 12:00
  • I ended up manually allocating space for thread-local copies of the arrays. Each thread does 1/8 of the accumulation into its local copy, and then the threads accumulate their local copy into a global copy inside of a #pragma omp critical block. Since the number of cores (8) is much smaller than n, the synchronization overhead is negligible. It ain't pretty, but it works. – Andrew Wagner Oct 24 '10 at 17:21
  • using OpenMP with C++ cannot be recommended: OpenMP does not support recent C++ standards. With C++ you may either want to use `std::thread`s etc, or [tbb](https://www.threadingbuildingblocks.org/) – Walter May 13 '16 at 21:30

5 Answers5

9

Array reduction is now possible with OpenMP 4.5 for C and C++. Here's an example:

#include <iostream>

int main()
{

  int myArray[6] = {};

  #pragma omp parallel for reduction(+:myArray[:6])
  for (int i=0; i<50; ++i)
  {
    double a = 2.0; // Or something non-trivial justifying the parallelism...
    for (int n = 0; n<6; ++n)
    {
      myArray[n] += a;
    }
  }
  // Print the array elements to see them summed   
  for (int n = 0; n<6; ++n)
  {
    std::cout << myArray[n] << " " << std::endl;
  } 
}

Outputs:

100
100
100
100
100
100

I compiled this with GCC 6.2. You can see which common compiler versions support the OpenMP 4.5 features here: https://www.openmp.org/resources/openmp-compilers-tools/

Note from the comments above that while this is convenient syntax, it may invoke a lot of overheads from creating copies of each array section for each thread.

Jeff Trull
  • 1,236
  • 11
  • 16
decvalts
  • 743
  • 10
  • 23
3

Only in Fortran in OpenMP 3.0, and probably only with certain compilers.

See the last example (Example 3) on:

http://wikis.sun.com/display/openmp/Fortran+Allocatable+Arrays

Andrew Wagner
  • 22,677
  • 21
  • 86
  • 100
  • 3
    It is now possible since OpenMP 4.5; see the answer of Chen Jiang below. Basically, you must specify _array sections_ (see Section 2.4, p. 44 of OpenMP 4.5 spec.). Your #pragma specification would look like this: `#pragma omp parallel reduction(+:a[:4])` Be careful with this however, you have to realize that each thread will allocate its own version of the array section; if you do this on large arrays with many threads, you might make your memory need explode. – Hugo Raguet Jun 02 '16 at 14:55
2

Now the latest openMP 4.5 spec has supports of reduction of C/C++ arrays. http://openmp.org/wp/2015/11/openmp-45-specs-released/

And latest GCC 6.1 also has supported this feature. http://openmp.org/wp/2016/05/gcc-61-released-supports-openmp-45/

But I didn't give it a try yet. Wish others can test this feature.

Chen Jiang
  • 21
  • 2
1

OpenMP cannot perform reductions on array or structure type variables (see restrictions).

You also might want to read up on private and shared clauses. private declares a variable to be private to each thread, where as shared declares a variable to be shared among all threads. I also found the answer to this question very useful with regards to OpenMP and arrays.

Community
  • 1
  • 1
Garrett Hyde
  • 5,409
  • 8
  • 49
  • 55
0

OpenMP can perform this operation as of OpenMP 4.5 and GCC 6.3 (and possibly lower) supports it. An example program looks as follows:

#include <vector>
#include <iostream>

int main(){
  std::vector<int> vec;

  #pragma omp declare reduction (merge : std::vector<int> : omp_out.insert(omp_out.end(), omp_in.begin(), omp_in.end()))

  #pragma omp parallel for default(none) schedule(static) reduction(merge: vec)
  for(int i=0;i<100;i++)
    vec.push_back(i);

  for(const auto x: vec)
    std::cout<<x<<"\n";

  return 0;
}

Note that omp_out and omp_in are special variables and that the type of the declare reduction must match the vector you are planning to reduce on.

Richard
  • 56,349
  • 34
  • 180
  • 251