3

Is it possible to use xmm register parameter with AVX intrinsics function (_mm256_**_**)?

My code require the usage of vecter integer operation (for load and storing data) along with vector floating point operation. The integer code is written with SSE2 intrinsics to be compatible with older CPU, while floating point is written with AVX to improve speed (there is also SSE code branch, so do not suggest this).

Currently, except for using compiler flag to automatically convert all SSE instructions to VEX-encoded version, are there any way using intrinsics function (i.e. no inline/external assembly) to force the use of VEX-encoded instruction on XMM register?

Note: I tried _mm256_castsi128_si256(), and this generates instruction with ymm operand.

innocenat
  • 574
  • 7
  • 21
  • Possible duplicate of [Using AVX CPU instructions: Poor performance without "/arch:AVX"](https://stackoverflow.com/q/7839925/608639) – jww Nov 18 '19 at 06:12
  • That question is even mentioned in the answer here, of course it is not the same! – innocenat Nov 18 '19 at 10:51

1 Answers1

4

You have a processor with AVX. It does not have XMM registers in only has YMM registers. If you compile all your code with AVX support (e.g. with -mavx in GCC or /arch:AVX in MSVC) then all your SSE2 code operates on the lower 128-bits of the YMM registers. There is nothing to worry about.

However, let's say you have two different modules one you compiled with SSE2 support (e.g. with -msse2 in GCC or /arch:SSE2 in MSVC) and the other with AVX support and you use functions from both then you do have something to worry about when you switch between them. In that case you should call _mm256_zeroupper() or _mm256_zeroall() when you switch from AVX to SSE2 code unless you want to take a performance hit. Using AVX CPU instructions: Poor performance without "/arch:AVX"

The simple solutions is to just compile all your code with AVX support. The only reason I can think of to compile different modules with different instruction set support is if you want to make a CPU dispatcher so your code can run on different processors. That's a bit of a pain to implement. But then you don't do state changes so the only time I can think of you need to worry about a state change is when you call functions from a shared library which were compiled with another instruction set (e.g. a DLL compiled with SSE2). In that case you may need to call _mm256_zeroupper() or _mm256_zeroall() when calling the library function from AVX code.

Community
  • 1
  • 1
Z boson
  • 32,619
  • 11
  • 123
  • 226
  • I already have dispatcher in place for SSE2 and SSE3 codepath. ICC can generate VEX-prefix version of SSE2 instruction per function, but I can't get MSVC to work with this. And since this project is already dll by itself (it's an Avisynth plugin), I don't want to load another small DLL. I am wonder if there is explicitly well to instruction compiler to create a VEX-encoded SSE2 instruction. – innocenat Dec 08 '13 at 08:54
  • You are trying to make a dispatcher and having troubles with MSVC? Then you should add the information to your question. I tried to do this myself awhile back [cpu-dispatcher-for-visual-studio-for-avx-and-sse](http://stackoverflow.com/questions/15406658/cpu-dispatcher-for-visual-studio-for-avx-and-sse). – Z boson Dec 08 '13 at 09:44
  • That's not my main point, though it's consequence. I want to know if its possible to do VEX-coded SSE2 instruction with intrinsics function. – innocenat Dec 08 '13 at 09:46
  • This is probably beyond my knowledge. I would naïvely assume that if you compile code using SSE2 intrinsics with AVX support (e.g. with /arch:AVX) then it is VEX-coded. – Z boson Dec 08 '13 at 10:02