See update (4 Aug 2022 at the end of this entry
To do this, use the Intel Intrinsics macros during program startup. For example:
#include <immintrin.h>
int main() {
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
}
In my version of MSVC, this emitted the following assembly code:
stmxcsr DWORD PTR tv805[rsp]
mov eax, DWORD PTR tv805[rsp]
bts eax, 15
mov DWORD PTR tv807[rsp], eax
ldmxcsr DWORD PTR tv807[rsp]
MXCSR is the control and status register, and this code is setting bit 15, which turns flush zero mode on.
One thing to note: this only affects denormals resulting from a computation. If you want to also set denormals to zero if they're used as input, you also need to set the DAZ flag (denormals are zero), using the following command:
_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
See https://software.intel.com/en-us/cpp-compiler-developer-guide-and-reference-setting-the-ftz-and-daz-flags for more information.
Also note that you need to set MXCSR for each thread, as the values contained are local to each thread.
Update 4 Aug 2022
I've now had to deal with ARM processors as well. The following is a cross-platform macro that works on ARM and Intel:
#ifndef __ARM_ARCH
extern "C" {
extern unsigned int _mm_getcsr();
extern void _mm_setcsr(unsigned int);
}
#define MY_FAST_FLOATS _mm_setcsr(_mm_getcsr() | 0x8040U)
#else
#define MY_FPU_GETCW(fpcr) __asm__ __volatile__("mrs %0, fpcr" : "=r"(fpcr))
#define MY_FPU_SETCW(fpcr) __asm__ __volatile__("msr fpcr, %0" : : "r"(fpcr))
#define MY_FAST_FLOATS \
{ \
uint64_t eE2Hsb4v {}; /* random name to avoid shadowing warnings */ \
MY_FPU_GETCW(eE2Hsb4v); \
eE2Hsb4v |= (1 << 24) | (1 << 19); /* FZ flag, FZ16 flag; flush denormals to zero */ \
MY_FPU_SETCW(eE2Hsb4v); \
} \
static_assert(true, "require semi-colon after macro with this assert")
#endif