4

When I use Visual Studio to generate AVX2 gather instructions via an compiler instrinsic, it does not insert VXORPS instructions to break the dependency between the prior instruction which writes that YMM register and the gather.

The Intel compiler however does do this and the net result is a noticeable performance improvement due to the data dependency being broken.

For reasons that I don't want to go into, I cannot use the Intel compiler, so is there any way that I can "force" Visual Studio to insert that VXORPS instruction?

I already tried creating an intermediate __m256i and calling VXORPS on that but that did not work.

rohitsan
  • 1,001
  • 8
  • 31
  • 3
    Rather than just words could you show a little code with the intrinsics? – Z boson Oct 26 '15 at 08:13
  • 3
    [Compile your function with GCC using `-mabi=ms` then convert the ELF64 object file to COFF64 and link it into MSVC](http://stackoverflow.com/questions/4770918/converting-c-object-file-from-linux-o-to-windows-obj/21212320#21212320). If you continue to use MSVC for optimization especially with AVX/AVX2/FMA it's going to disappoint you over and over again. – Z boson Oct 27 '15 at 08:46
  • Did you try to use "volatile" with these commands so the compile will not optimize them away? – ChipK Jul 02 '18 at 05:04
  • You can write gathering logic manually using intrinsics. Sometime ago I did write gathering logic for matrix multiplication. If you are interested i can post the snippet. – yadhu Jul 18 '18 at 18:58

0 Answers0