Declare Reduction over Vector, Running on 1 thread gives different result than without openmp

Question

I have the following piece of code that gives the wrong answer even if I run it on 1 thread. The code gives the correct result if the two pragmas before the for-loops are commented away. How can this be? I thought on 1 thread, there would be no difference between using OpenMP and not, except possibly some minor overhead. Also, what should I do to get a "correct behaviour"? I don't have the same problem when I have just one for-loop, but with more then 1, it doesn't work as I would think.

#include<iostream>
#include<vector>
#include<algorithm>
#include<omp.h>
using namespace std;
#pragma omp declare reduction(vec_double_plus : std::vector<double> : \
                              std::transform(omp_out.begin(), omp_out.end(), omp_in.begin(), omp_out.begin(), std::plus<double>())) \
                    initializer(omp_priv = omp_orig)

int main() {
    vector<int> v;
    vector<double> w(2);
    for (int i = 0; i < 1000; i++) {
        if (i % 2 == 0) {
            v.push_back(0);
        }
        else {
            v.push_back(1);
        }
    }
    #pragma omp parallel for reduction(vec_double_plus:w)
    for (int i = 0; i < 500; i++) {
        int r = v[i];
        w[r] += i;
    }
    #pragma omp parallel for reduction(vec_double_plus:w)
    for (int i = 500; i < 1000; i++) {
        int r = v[i];
        w[r] += i;
    }
    std::cout << w[0] << std::endl;
    std::cout << w[1] << std::endl;
}

I am aware that it looks silly to split the for-loop, and indeed, it runs with the expected behaviour, regardless of the number of threads, if this is made into one for-loop. However, I don't quite understand what is happening when there are two for-loops. — JHadamard, Aug 19 '19 at 15:41
It would be great if you could add the expected and actual results. It's nice that you know what they should be but everyone reading the question has to currently infer it by themselves first. — Max Langhof, Aug 19 '19 at 15:55

score 1 · Accepted Answer · answered Aug 19 '19 at 17:06

The issue is, that the code assume that the original variable from the outside scope is initialized with the neutral element of the reduction - i.e. w is full of zeros. It will create the local copies from this outside and add it again to the original copy. This even happens for a single thread.

You can change the code to initialize omp_priv with zeroes like the following:

initializer(omp_priv = decltype(omp_orig)(omp_orig.size()))

The code looks familiar to me, so sorry about the confusion. I'll fix the original answer.

Declare Reduction over Vector, Running on 1 thread gives different result than without openmp

1 Answers1