17

I realize that reduction is only usable for POD types in C++. What would you do to implement a reduction for a complex type accumulator?

complex<double> x(0.0,0.0), y(1.0,1.0);
#pragma omp parallel for reduction(+:x)
for(int i=0; i<5; i++)
{
    x += y;
}

(noting that I may have left some syntax out). It seems an obvious solution would be to split real and imaginary components into temporary doubles, then accumulate on those. I guess I'm looking for elegance, and that seems ... less than pretty. Would that be the typical approach here?

alain.janinm
  • 19,951
  • 10
  • 65
  • 112
Fadecomic
  • 1,220
  • 1
  • 10
  • 23
  • 1
    Personally I wish they'd just drop the POD requirement for OpenMP in the next version of the spec it would make it so much nicer to work with. Nice question though. – Flexo Aug 23 '11 at 15:34

2 Answers2

9

The typical workaround in absence of user-defined reductions in OpenMP is even uglier than what you suggested. Usually, prior to the parallel region people create an array of (at least) as many elements as there will be threads in the region, accumulate partial results separately for each thread using omp_get_thread_num() as an index to the array, and do final reduction of the accumulated results in a loop after the parallel region.

As far as I know, OpenMP language committee works on adding user-defined reductions to the specification, so maybe it will be finally resolved in a few years.

Alexey Kukanov
  • 12,479
  • 2
  • 36
  • 55
  • The OpenMP version 4 now does have user-defined reductions. Obviously that does not mean your compiler supports it already. Gcc 4.9 (released 1 month ago) is probably the only one so far. – Jan Hudec Apr 24 '14 at 09:16
4

Sorry, OpenMP simply doesn't support that at this time. Unfortunately, you need to do parallel reduction in an ugly way what you already described.

However, if such parallel reduction is really frequent, I'd like to make a constructor similar to parallel_reduce in TBB. Implementation of such construct is fairly straight forward. Cilk plus has a more powerful reducer object, but I didn't check whether it supports non POD.

FYI, such kind of restriction can also be found in threadprivate pragma. I've tested with VC++ 2008/2010 and Intel compilers (icc). VC++ can't support threadprivate with a struct/class that has a constructor or destructor (or a scalar variable that requires function call to be initialized), by throwing an error: error C3057, "dynamic initialization of 'threadprivate' symbols". You may read this MSDN link as well. However, icc is okay with the case of C3057. You can see, at least, two major implementations are such different.

I guess that supporting parallel reduction on non-POD would have the similar problem above. In order to support parallel reduction, each parallel section should allocate a thread-local variable for a reduction variable. So, if a given reduction variable is non-POD, they may need to call user-defined constructor.This makes the same problem what I have mentioned in the case of C3057.

minjang
  • 8,860
  • 9
  • 42
  • 61
  • Reducers in Cilk Plus are customizable; one may write a his own reducer. So I think there is no restriction on supported types, though I do not remember what requirements the out-of-the-box reducers have. – Alexey Kukanov Aug 24 '11 at 06:32