I am trying to compute the average value over adjacent elements within a matrix, but am stuck getting OpenMP's vectorization to work. As I understand the second nested for-loop, the reduction
clause should ensure that no race conditions occur when writing to elements of next
. However, when compiling the code (I tried auto-vectorization with both GCC GCC 7.3.0 and ICC and OpenMP > 4.5) I get the report: "error: reduction variable "next" must be shared on entry to this OpenMP pragma". Why does this occur when variables are shared by default? How can I fix this issue since adding shared(next)
does not seem to help?
// CODE ABOVE (...)
size_t const width = 100;
size_t const height = 100;
float * restrict next = malloc(sizeof(float)*width*height);
// INITIALIZATION OF 'next' (this works fine)
#pragma omp for simd collapse(2)
for(size_t j = 1; j < height-1; j++)
for(size_t i = 1; i < width-1; i++)
next[j*width+i] = 0.0f;
// COMPUTE AVERAGE FOR INNER ELEMENTS
#pragma omp for simd collapse(4) reduction(+:next[0:width*height])
for(size_t j = 1; j < height-1; ++j){
for(size_t i = 1; i < width-1; ++i){
// compute average over adjacent elements
for(size_t _j = 0; _j < 3; _j++) {
for(size_t _i = 0; _i < 3; _i++) {
next[j*width + i] += (1.0 / 9.0) * next[(j-1 +_j)*width + (i-1 + _i)];
}
}
}
}