I just experimented with the SSE option "denormals are zeros" through setting this option with _mm_setcsr( _mm_getcsr() | 0x40 )
.
I found an in interesting thing: this doesn't prevent SSE from generating
denormals when both operands are non-denormal!
It just makes SSE consider denormal operands as if they were zeros.
As I explained I know what this option does. But what is this option good for?
Addendum
I just read the Intel article linked by user nucleon. And I was curious about the performance impact of denormals on SSE computations.
So I wrote a little Windows program to test this:
#include <windows.h>
#include <intrin.h>
#include <iostream>
using namespace std;
union DBL
{
DWORDLONG dwlValue;
double value;
};
int main()
{
DWORDLONG dwlTicks;
DBL d;
double sum;
dwlTicks = __rdtsc();
for( d.dwlValue = 0, sum = 0.0; d.dwlValue < 100000000; d.dwlValue++ )
sum += d.value;
dwlTicks = __rdtsc() - dwlTicks;
cout << sum << endl;
cout << dwlTicks / 100000000.0 << endl;
dwlTicks = __rdtsc();
for( d.dwlValue = 0x0010000000000000u, sum = 0.0;
d.dwlValue < (0x0010000000000000u + 100000000); d.dwlValue++ )
sum += d.value;
dwlTicks = __rdtsc() - dwlTicks;
cout << sum << endl;
cout << dwlTicks / 100000000.0 << endl;
return 0;
}
(I printed the sums only to prevent the compiler from optimizing away the summation.)
The result is that on my Xeon E3-1240 (Skylake), each iteration takes four clock-cycles when "d" is non-denormal. When "d" is a denormal, each iteration takes about 150 clock cycles! I'd never believe denormals would have such a huge performance impact if I hadn't seen the opposite.