When I use Visual Studio to generate AVX2 gather instructions via an compiler instrinsic, it does not insert VXORPS instructions to break the dependency between the prior instruction which writes that YMM register and the gather.
The Intel compiler however does do this and the net result is a noticeable performance improvement due to the data dependency being broken.
For reasons that I don't want to go into, I cannot use the Intel compiler, so is there any way that I can "force" Visual Studio to insert that VXORPS instruction?
I already tried creating an intermediate __m256i and calling VXORPS on that but that did not work.