The following 3 lines give imprecise results with "gcc -Ofast -march=skylake":
int32_t i = -5;
const double sqr_N_min_1 = (double)i * i;
1. - ((double)i * i) / sqr_N_min_1
Obviously, sqr_N_min_1
gets 25.
, and in the 3rd line (-5 * -5) / 25
should become 1.
so that the overall result from the 3rd line is exactly 0.
. Indeed, this is true for compiler options "gcc -O3 -march=skylake".
But with "-Ofast" the last line yields -2.081668e-17
instead of 0.
and with other i
than -5
(e.g. 6
or 7
) it gets other very small positive or negative random deviations from 0.
.
My question is: Where exactly is the source of this imprecision?
To investigate this, I wrote a small test program in C:
#include <stdint.h> /* int32_t */
#include <stdio.h>
#define MAX_SIZE 10
double W[MAX_SIZE];
int main( int argc, char *argv[] )
{
volatile int32_t n = 6; /* try 6 7 or argv[1][0]-'0' */
double *w = W;
int32_t i = 1 - n;
const int32_t end = n - 1;
const double sqr_N_min_1 = (double)i * i;
/* Here is the crucial part. The loop avoids the compiler replacing it with constants: */
do {
*w++ = 1. - ((double)i * i) / sqr_N_min_1;
} while ( (i+=2) <= end );
/* Then, show the results (only the 1st and last output line matters): */
w = W;
i = 1 - n;
do {
fprintf( stderr, "%e\n", *w++ );
} while ( (i+=2) <= end );
return( 0 );
}
Godbolt shows me the assembly produced by an "x86-64 gcc9.3" with the option "-Ofast -march=skylake" vs. "-O3 -march=skylake". Please, inspect the five columns of the website (1. source code, 2. assembly with "-Ofast", 3. assembly with "-O3", 4. output of 1st assembly, 5. output of 2nd assembly):
Godbolt site with five columns
As you can see the differences in the assemblies are obvious, but I can't figure out where exactly the imprecision comes from. So, the question is, which assembler instruction(s) are responsible for this?
A follow-up question is: Is there a possibility to avoid this imprecision with "-Ofast -march=skylake" by reformulating the C-program?