TL;DR: When using /favor:AMD64
add /d2vzeroupper
to avoid very poor performance of SSE code on both current AMD CPUs and Intel CPUs.
Generally /d1...
and /d2...
are "secret" (undocumented) MSVC options to tune compiler behavior. /d1...
apply to complier front-end, /d2...
apply to compiler back-end.
/d2vzeroupper
enables compiler-generated vzeroupper
instruction
See Do I need to use _mm256_zeroupper in 2021? for more information.
Normally it is by default. You can disable it by /d2vzeroupper-
. See here: https://godbolt.org/z/P48crzTrb
/favor:AMD64
switch suppresses vzeroupper
, so /d2vzeroupper
enables it back.
The up-to-date Visual Studio 2022 has fixed that, so /favor:AMD64
still emits vzeroupper
and /d2vzeroupper
is not needed to enable it.
Reason: current AMD optimization guides (available from AMD site; direct pdf link) suggest:
2.11.6 Mixing AVX and SSE
There is a significant penalty for mixing SSE and AVX instructions when the upper 128 bits of the
YMM registers contain non-zero data. Transitioning in either direction will cause a micro-fault to
spill or fill the upper 128 bits of all 16 YMM registers. There will be an approximately 100 cycle
penalty to signal and handle this fault. To avoid this penalty, a VZEROUPPER or VZEROALL
instruction should be used to clear the upper 128 bits of all YMM registers when transitioning from
AVX code to SSE or unknown code
Older AMD processor did not need vzeroupper
, so /favor:AMD64
implemented optimization for them, even though penalizing Intel CPUs. From MS docs:
/favor:AMD64
(x64 only) optimizes the generated code for the AMD Opteron, and Athlon processors that support 64-bit extensions. The optimized code can run on all x64 compatible platforms. Code that is generated by using /favor:AMD64 might cause worse performance on Intel processors that support Intel64.