I am trying to understand the conceptual reason why OpenMP breaks loop vectorization. Also any suggestions for fixing this would be helpful. I am considering manually parallelizing this to fix this issue, but that would certainly not be elegant and result in a massive amount of code bloat, as my code consists of several such sections that lend themselves to vectorization and parallelization.
I am using
Microsoft (R) C/C++ Optimizing Compiler Version 17.00.60315.1 for x64
With OpenMP:
info C5002: loop not vectorized due to reason '502'
Without OpenMP:
info C5001: loop vectorized
The VS vectorization page says this error happens when:
Induction variable is stepped in some manner other than a simple +1
Can I force it to step in stride 1?
The loop
#pragma omp parallel for
for (int j = 0; j < H*W; j++)//A,B,C,D,IN are __restricted
{
float Gs = D[j]-B[j];
float Gc = A[j]-C[j];
in[j]=atan2f(Gs,Gc);
}
Best Effort(?)
#pragma omp parallel
{// This seems to vectorize, but it still requires quite a lot of boiler code
int middle = H*W/2;
#pragma omp sections nowait
{
#pragma omp section
for (int j = 0; j < middle; j++)
{
float Gs = D[j]-B[j];
float Gc = A[j]-C[j];
in[j]=atan2f(Gs,Gc);
}
#pragma omp section
for (int j = middle; j < H*W; j++)
{
float Gs = D[j]-B[j];
float Gc = A[j]-C[j];
in[j]=atan2f(Gs,Gc);
}
}
}