8

I want to make this code parallel:

std::vector<float> res(n,0);
std::vector<float> vals(m);
std::vector<float> indexes(m);
// fill indexes with values in range [0,n)
// fill vals and indexes
for(size_t i=0; i<m; i++){
  res[indexes[i]] += //something using vas[i];
}

In this article it's suggested to use:

#pragma omp parallel for reduction(+:myArray[:6])

In this question the same approach is proposed in the comments section.

I have two questions:

  1. I don't know m at compile time, and from these two examples it seems that's required. Is it so? Or if I can use it for this case, what do I have to replace ? with in the following command #pragma omp parallel for reduction(+:res[:?]) ? m or n?
  2. Is it relevant that the indexes of the for are relative to indexes and vals and not to res, especially considering that reduction is done on the latter one?

However, If so, how can I solve this problem?

Community
  • 1
  • 1
justHelloWorld
  • 6,478
  • 8
  • 58
  • 138

1 Answers1

15

It is fairly straight forward to do a user declared reduction for C++ vectors of a specific type:

#include <algorithm>
#include <vector>

#pragma omp declare reduction(vec_float_plus : std::vector<float> : \
                              std::transform(omp_out.begin(), omp_out.end(), omp_in.begin(), omp_out.begin(), std::plus<float>())) \
                    initializer(omp_priv = decltype(omp_orig)(omp_orig.size()))

std::vector<float> res(n,0);
#pragma omp parallel for reduction(vec_float_plus : res)
for(size_t i=0; i<m; i++){
    res[...] += ...;
}

1a) Not knowing m at compile time is not a requirement.

1b) You cannot use the array section reduction on std::vectors, because they are not arrays (and std::vector::data is not an identifier). If it were possible, you'd have to use n, as this is the number of elements in the array section.

2) As long as you are only reading indexes and vals, there is no issue.

Edit: The original initializer caluse was simpler: initializer(omp_priv = omp_orig). However, if the original copy is then not full of zeroes, the result will be wrong. Therefore, I suggest the more complicated initializer which always creates zero-element vectors.

Zulan
  • 21,896
  • 6
  • 49
  • 109
  • Omp reduction asserts that the compiler need not take care of data overlaps in the case of duplicate indirect indices. – tim18 Apr 02 '17 at 17:06
  • @time18 I don't understand what you mean. Could you elaborate? There is a private copy of `res` for each thread, so there is no issue if there are duplicate indirect indices. – Zulan Apr 03 '17 at 08:42
  • @Zulan Thanks for your answer. Could you please give a look at [this](http://stackoverflow.com/questions/43193566/omp-reduction-on-vector-of-cvmat-or-cvmat-in-general) ? – justHelloWorld Apr 04 '17 at 07:16
  • If we first resize vec to `n`, and then set the elements just like an array, without defining user-declared-reduction, will this involve `false-sharing` and slow down the performance? – avocado Jul 21 '18 at 01:40
  • Just wanted to add: be careful when using this inside a `parallel` region. Then, either a `scope` clause (which isn't supported much yet) or a loop over all threads is necessary to achieve correct results. – RL-S Nov 05 '21 at 13:47