4

I use both gcc10 and clang12 in Ubuntu. I just found that if I enable the -ffast-math flag, in my C++ project, there will be an about 4 times performance improvement.

However, if I only enable -ffast-math at compile time and not at link time, there will be no performance improvement. What does it mean to use -ffast-math when linking, and will it link to any special ffast-math libraries in the system?

P.S: This performance improvement actually makes the performance normal. I once asked a question about the poor performance of AVX instruction on Intel processor. Now I can make the performance normal as long as I use -ffast-math flag to compile and link programs on Linux, but even if I use clang and -ffast-math on windows, the performance is still poor. So I wonder if I have linked to any special system libraries under Linux.

TianZerL
  • 135
  • 6

1 Answers1

4

However, if I only enable -ffast-math at compile time and not at link time, there will be no performance improvement. What does it mean to use -ffast-math when linking, and will it link to any special ffast-math libraries in the system?

Turns out gcc does link in crtfastmath.o when -ffast-math is specified for linker (undocumented feature).

For x86 see https://github.com/gcc-mirror/gcc/blob/master/libgcc/config/i386/crtfastmath.c#L83, it sets the following CPU options:

#define MXCSR_DAZ (1 << 6)  /* Enable denormals are zero mode */
#define MXCSR_FTZ (1 << 15) /* Enable flush to zero mode */

Denormalized floating point numbers are much slower to handle, so that disabling them in the CPU makes floating point computations faster.

From Intel 64 and IA-32 Architectures Optimization Reference Manual:

6.5.3 Flush-to-Zero and Denormals-are-Zero Modes

The flush-to-zero (FTZ) and denormals-are-zero (DAZ) modes are not compatible with the IEEE Standard 754. They are provided to improve performance for applications where underflow is common and where the generation of a denormalized result is not necessary.

3.8.3.3 Floating-point Exceptions in SSE/SSE2/SSE3 Code

Most special situations that involve masked floating-point exceptions are handled efficiently in hardware. When a masked overflow exception occurs while executing SSE/SSE2/SSE3 code, processor hardware can handles it without performance penalty.

Underflow exceptions and denormalized source operands are usually treated according to the IEEE 754 specification, but this can incur significant performance delay. If a programmer is willing to trade pure IEEE 754 compliance for speed, two non-IEEE 754 compliant modes are provided to speed situations where underflows and input are frequent: FTZ mode and DAZ mode.

When the FTZ mode is enabled, an underflow result is automatically converted to a zero with the correct sign. Although this behavior is not compliant with IEEE 754, it is provided for use in applications where performance is more important than IEEE 754 compliance. Since denormal results are not produced when the FTZ mode is enabled, the only denormal floating-point numbers that can be encountered in FTZ mode are the ones specified as constants (read only).

The DAZ mode is provided to handle denormal source operands efficiently when running a SIMD floating-point application. When the DAZ mode is enabled, input denormals are treated as zeros with the same sign. Enabling the DAZ mode is the way to deal with denormal floating-point constants when perfor mance is the objective.

If departing from the IEEE 754 specification is acceptable and performance is critical, run SSE/SSE2/SSE3 applications with FTZ and DAZ modes enabled.

Maxim Egorushkin
  • 131,725
  • 17
  • 180
  • 271