3

In order to use auto-vectorization for a c++ code which will be running on x86-64 and aarch64 processors, is just adding #pragma omp simd in the code is sufficient? I plan to compile in windows using msvc, on linux using gcc and using clang for ios/osx. Or are there any additional steps which will be needed? Are there any other limitations which i should be aware of?

#pragma omp simd
for (int i=0; i<size; ++i)
{
    c[i] = a[i] + b[i];
}
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Helena
  • 444
  • 2
  • 15
  • a,b and c should have same byte-length, probably size should be integer-multiple (or equal) of simd hardware's number of lanes for tha byte-length variables, most probably compiler needs to know if a,b and c are not alias of each other. – huseyin tugrul buyukisik May 07 '22 at 09:48
  • 3
    You also have to enable optimization, of course; at least `gcc -O1` for `#pragma omp simd` and `-fopenmp` to do anything, and at least `-O2`, preferably `-O3` recommended. This loop should vectorize even without OpenMP, with clang or `gcc -O3`, or MSVC `-O2`. OpenMP can get you vectorization even with `gcc -O2` which doesn't try to vectorize by default (until GCC12), though. – Peter Cordes May 07 '22 at 10:11
  • 1
    For auto vectorization you also need to specify explicitly for which architecture you are compiling with gcc. If it only needs to run no the same machine on which you are compiling, you can use `-march=native`, if not you need to look which explicit value for `-march=` is compatible with all hardware that the binary should run on. `x86-64-v3` (up to AVX2) should run on most recent hardware, `x86-64-v4` only on server hardware with AVX512 instructions. See more possibilities [here](https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html). – paleonix May 07 '22 at 13:42

0 Answers0