Sorry, OpenMP simply doesn't support that at this time. Unfortunately, you need to do parallel reduction in an ugly way what you already described.
However, if such parallel reduction is really frequent, I'd like to make a constructor similar to parallel_reduce
in TBB. Implementation of such construct is fairly straight forward. Cilk plus has a more powerful reducer object, but I didn't check whether it supports non POD.
FYI, such kind of restriction can also be found in threadprivate
pragma. I've tested with VC++ 2008/2010 and Intel compilers (icc). VC++ can't support threadprivate
with a struct/class that has a constructor or destructor (or a scalar variable that requires function call to be initialized), by throwing an error: error C3057, "dynamic initialization of 'threadprivate' symbols". You may read this MSDN link as well. However, icc is okay with the case of C3057. You can see, at least, two major implementations are such different.
I guess that supporting parallel reduction on non-POD would have the similar problem above. In order to support parallel reduction, each parallel section should allocate a thread-local variable for a reduction variable. So, if a given reduction variable is non-POD, they may need to call user-defined constructor.This makes the same problem what I have mentioned in the case of C3057.