OpenMP 4 simd vectorization for c=c+a*b

Question

I do not know if OpenMP 4 support this for loop or not. The speed with and without the pragma is the same.

#pragma omp  for simd
for (size_t i = 0; i < col; i++)
{
    C[i] += A[i]* B[i];
}

What compiler options did you use? What compiler? What OS? What hardware. `omp simd` is rather pointless in my opinion with x86 compilers due to auto-vectorization. GCC uses auto-vectorization with `-O3`. Try comparing with and without using SIMD with `-O2`. What size is `col`. If it's very large then this operations is memory bandwidth bound anyway. — Z boson, Dec 24 '15 at 14:30

score 3 · Answer 1 · edited May 23 '17 at 12:31

3

The reason (I guess) for the pragma do be of no effect is double:

The code vectorises already without the simd directive; and
The code is memory bound anyway, so adding more threads to compute it won't make much differences unless it gives you access to more memory bandwidth. See this excellent answer for more details.

edited May 23 '17 at 12:31

Community

answered Dec 24 '15 at 10:33

Gilles

1 Answers1