I am benchmarking a set of applications on a SandyBridge processor (i7-3820). The benchmark consists of two different versions. These two versions contain the same code with the only difference that the first version uses sse/sse2 instrinsics and the second version uses avx instrinsics.
For the compilation of the benchmark I am using the Visual Studio 2015.
Compiling the version with sse instrinsics either on x64 or x86, the execution time is almost the same. But compiling the benchmark with avx instrinsics for x64, the execution time is worst (almost double) comparing the benchmark with avx instrinsisc and compiled for x86. Furthermore, the execution time of avx benchmark compiled with x86 succeeds only a small speed up (x8%) comparing the benchmark of sse instrinsics.
Finally, I tested the above configurations on an Ivy Bridge processor (i7-3770) and the execution times execution times between x64 and x86 for avx instrincis was same. But the avx intrinsics didn't show any improvement against the sse.
Is there any explanation about the bad performance of avx instrinsics on Sandy Bridge for compiling for x64?
Why the two architectures doesn't show any speed up for the avx instruction against the sse instructions?
Moreover, I tried different compilation switching from arch:AVX to /arch:SSE2 and vice versa but nothing was changed at execution times. But if I am right, the 'Enable Enhanced instruction set' property in visual studio effects only the vectorization.
Thanks in advance.