Since OpenMP 4.0, user-defined reduction is supported. So I defined the reduction on std::vector in C++ exactly from here. It works fine with GNU/5.4.0 and GNU/6.4.0, but it returns random values for the reduction with intel/2018.1.163.
This is the example:
#include <iostream>
#include <vector>
#include <algorithm>
#include "omp.h"
#pragma omp declare reduction(vec_double_plus : std::vector<double> : \
std::transform(omp_out.begin(), omp_out.end(), omp_in.begin(), omp_out.begin(), std::plus<double>())) \
initializer(omp_priv = omp_orig)
int main() {
omp_set_num_threads(4);
int size = 100;
std::vector<double> w(size,0);
#pragma omp parallel for reduction(vec_double_plus:w)
for (int i = 0; i < 4; ++i)
for (int j = 0; j < w.size(); ++j)
w[j] += 1;
for(auto i:w)
if(i != 4)
std::cout << i << std::endl;
return 0;
}
Each thread adds 1 to all w entries (its local w) and at the end all of them are added to together (reduction). The result for all w entries is 4 with GNU, but random with the intel compiler. Does anyone have any idea what is happening here?