During normal conditions, enabled by default loop auto-vectorizer can also use extended set of instructions (for example AVX even when arch explicitly set to SSE2).
But how should it work then, if the cpu doesn't support AVX? Compiler inserts special runtime ISA check (via __isa_available?) for enhanced instruction set support and choose code path with supported instructions on demand. Looks like it was done similar to SSE4.2 instructions emission for modern cpus even when arch is SSE2.
In the last update (15.5) auto-vectorization was broken at least in x86 / x64 builds. Compiler doesn't insert runtime ISA check and emits AVX instruction during loop vectorization (in my case it was vpermilps
).
Temporary solutions:
As i've suggested in a workaround, you can remove auto-vectorization for a selected loop with:
#pragma loop(no_vector)
for / while / do while ...
Unfortunately, it's a fast hack, since potentially every loop can be vectorized, and it is unpractical to insert such pragma everywhere. Of course, you can also get performance drop.
Another temporary solution is to try /d2Qvec-sse2only internal compiler switch to use only SSE2 during auto-vectorization (at least, it should work with Visual Studio 2013). This switch is undocumented and can be changed without notice.
Update: As mentioned by Cheney Wang, bug is sent to C++ team, so you can track its status in community item.