0

Let's say I have a generic c++ code based on c++ standard. This code is meant to run on windows 64 bit and Linux 64 bit.

Can we direct the compiler to make use of intrinsics automatically? i.e. I don't want to write any instructions available. What I want is to compiler automatically use any intrinsics if available for my normal c++ code. If this is possible, then how to do it for Visual studio in windows and on Linux?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Helena
  • 444
  • 2
  • 15
  • 1
    Can you give an example of an intrinsic you want to force? Compilers often recognize opportunities like converting two shifts into a rotate intrinsic. If you are looking specifically for simd intrinsics, you want to look at "auto-vectorization". – Raymond Chen May 07 '22 at 02:09
  • @RaymondChen : that's exactly my question is. Let's say I just wanna write normal c++ code and let the CPU decide to use the intrinsics if possible. – Helena May 07 '22 at 02:33
  • Let the CPU decide (at run time)? Or let the compiler decide (at build time)? The latter is basically a compiler setting; the former is rather more tricky (you have to have backup code if the specific instructions aren't available). – Adrian Mole May 07 '22 at 02:56
  • 2
    It's not clear what you mean by "intrinsic". Everything the compiler does is potentially an intrinsic. There is an "add two integers" intrinsic, which the compiler will probably use when you type `+`. Depending on the compiler optimization flags, it may try to use fancier intrinsics. (Low optimization levels use less fancy intrinsics.) Here is documentation on auto vectorization in msvc. https://learn.microsoft.com/en-us/cpp/parallel/auto-parallelization-and-auto-vectorization?view=msvc-170 – Raymond Chen May 07 '22 at 03:05
  • @RaymondChen : I think how i viewed it as was "I am writing c++ code and compiler has to convert it to CPU instructions". One way is that i myself tell the compiler to use specific special instructions BUT another option is that compiler needs to detect if a specific code can be handled faster by available CPU instructions. How can i instruct the compiler on different platforms to use special instructions automatically without me specifically writing those instructions in c++ code. How do i instruct msvc and g++? – Helena May 07 '22 at 04:00
  • 1
    @Helena you just compile it. Usually with higher optimization levels. – Taekahn May 07 '22 at 04:20
  • 2
    @Helena *"compiler needs to detect if a specific code can be handled faster by available CPU instructions"* -- isn't this what one should expect to happen when compiling for speed? Why do you care *how* the speed is obtained? Or to put it another way: why is "use of intrinsics" more important to you than "highest optimization level"? – JaMiT May 07 '22 at 04:33
  • @JaMiT: Compilers sometimes need an extra nudge to auto-vectorize, e.g. `#pragma omp simd`. For floating-point code, `#pragma omp simd reduction(+:foo)` will even relax strict-fp for that specific variable, letting a compiler vectorize the sum of an array with multiple accumulators (elements of SIMD vectors), instead of summing in order. That's necessary for vectorization without using `-ffast-math` globally, or manually using intrinsics in your source code. – Peter Cordes May 07 '22 at 05:11
  • @Helena: An "intrinsic" is inherently a source-code thing. e.g. `_mm_add_epi32` or `_mm_sad_epu8`. Or as Raymond said, `+` can be seen as an intrinsic for an `add` or `lea` instruction, but of course can compile differently. A compiler optimizing a loop doesn't use inline wrapper functions, it uses the actual *instructions* directly that were also available via intrinsics. All the major mainstream compilers **auto-vectorize** at max optimization, at least for simple problems, e.g. integer sum of array: https://godbolt.org/z/a1jrvbnhc shows clang and MSVC (for AArch64 and x86-64). – Peter Cordes May 07 '22 at 05:18
  • @PeterCordes *"Compilers sometimes need an extra nudge to auto-vectorize"* and *"All the major mainstream compilers auto-vectorize at max optimization"* Huh??? OK, it's not actually a contradiction because the latter is qualified with *"at least for simple problems"*, but still... odd... That being said, I did not intend to suggest that max optimization solves everything, only that it should be tried first (because it is easy). The question's wording leads me to think there is a reasonable possibility that the OP thinks that intrinsics solve everything (which they don't) and skipped `-03`. – JaMiT May 07 '22 at 21:26
  • @JaMiT: I mean they *try* to auto-vectorize with standard optimization options (e.g. `gcc -O3` includes `-ftree-vectorize`), but give up on some problems (e.g. because of possible aliasing or strict-FP) without extra help like `int *__restrict arr` to promise an output doesn't overlap inputs, or `#pragma omp simd reduction (+:foo)` to pretend summing into `foo` is associative, or compiling with `-ffast-math`. But yes, I upvoted your comment, definitely try `-O3 -march=native` first. But you won't know if there's more speed to be gained if you don't check the asm for auto-vectorization. – Peter Cordes May 08 '22 at 04:55
  • @JaMiT: Or for the querent's other recent question about an actual problem: [How to make use of SIMD capability for sum of squared differences between 8-bit components of RGBA pixels?](https://stackoverflow.com/q/72136987) shows that manual vectorization is basically necessary; I don't think GCC or clang auto-vectorization would do anything nearly as good as what can be done manually, if they even find a way to use any SIMD instructions at all. Although I wouldn't fully rule it out; I didn't check since there wasn't a complete plain-C version ready to compile, just a snippet. – Peter Cordes May 08 '22 at 04:57

0 Answers0