Compiler flags for enhancing the precision of intermediate floating-point calculation

Question

Is there a flag in gcc/clang that specifies the precision of the intermediate floating-point calculation?

Suppose I have a C code

double x = 3.1415926;
double y = 1.414;
double z = x * y;

Is there a compiler flag that allows 'x*y' to be calculated in the highest possible precision of the user's machine, say, long-double (64-bit mantissa), and then truncated back to double (53-bit mantissa, the precision of the declared variable type)?

For information only, I am using Ubuntu 14.04 on a 64-bit machine.

BaseZen · Accepted Answer · 2016-08-07T21:01:50.917

GCC

[Edit on observed behavior of gcc 4.8.4, where default behavior is the opposite to documentation]

You need to make use of the 80-bit registers in the x87 FPU. With with -mfpmath=387 you can override the default use of the SSE registers XMM0-XMM7. This default actually gives you the the IEEE behavior where 64-bit registers are used at every step.

See: https://gcc.gnu.org/wiki/x87note

Thus, by default x87 arithmetic is not true 64/32 bit IEEE, but gets extended precision from the x87 unit. However, anytime a value is moved from the registers to an IEEE 64 or 32 bit storage location, this 80 bit value must be rounded down to the appropriate number of bits.

If your operation is extremely complex, however, register spilling may occur; the FP register stack is only depth 8. So when the spillage copies out to a word-sized RAM location you'll get the rounding then. You'll either need to declare long double yourself this case and round manually at the end, or check the assembler output for explicit spillage.

More information about registers here: https://software.intel.com/en-us/articles/introduction-to-x64-assembly

In particular, XMM0...7 registers, while 128 bits wide, are only so to accommodate two simultaneous 64-bit FP operations. So you want be seeing the stack-operated FPR registers with the FLD (load), FMUL (multiply), and FSTP (store-and-pop) instructions.

So I compiled this code:

double mult(double x, double y) {
    return x * y;
}

with:

gcc -mfpmath=387 -Ofast -o precision.s -S precision.c

And got:

mult:
  .LFB24:
    .cfi_startproc
    movsd   %xmm1, -8(%rsp)
    fldl    -8(%rsp)
    movsd   %xmm0, -8(%rsp)
    fldl    -8(%rsp)
    fmulp   %st, %st(1)
    fstpl   -8(%rsp)
    movsd   -8(%rsp), %xmm0
    ret
    .cfi_endproc

Everything makes perfect sense now. Floating point values are passed via registers XMM0 and XMM1 (although they have to take a bizarre round-trip through memory before they can be put on the FPR stack), and the result is returned in XMM0 in accordance with above Intel reference. Not sure why there isn't a simple FLD instruction directly from XMM0/1 but apparently the instruction set doesn't do that.

If you compare to -mfpmath=sse, there's a lot less to have to do in the latter case, because the operands are ready and waiting in the XMM0/1 registers and it's as simple as a single MULSD instruction.

Thanks so much. BTW, does the "spiling" in your answer mean "overflow"? — zell, Aug 07 '16 at 17:13
No, that's a compiler optimization term meaning the "live" (needed for computation) values are too numerous so they don't all fit in registers, so one must be spilled to memory so make room for another. — BaseZen, Aug 07 '16 at 17:35
Thanks. I checked the binary code produced by the small program presented above. A segment of the generated code is: addsd -8(%rbp), %xmm0, movsd %xmm0, -16(%rbp), movq -16(%rbp), %rax". How do we know the number of bits in the register xmm0? — zell, Aug 07 '16 at 19:36
Answer edited accordingly. I see neither FMUL nor MULSD in your snippet above so I cannot tell which type of arithmetic you've invoked. — BaseZen, Aug 07 '16 at 20:21
“You'll either need to declare long double yourself this case and round manually at the end, or check the assembler output for explicit spillage.”: using `long double` is an excellent solution that does not require `-mfpmath=387`. However, much simpler than checking the assembler for spillage is, when compiling C, to use `-std=c99` or `-std=c11`, which make GCC stick to the description of `FLT_EVAL_METHOD` in C99 and use `long double` precision for all intermediate computations. Then if spillage occurs, a 80-bit `long double` is spilt and the spillage is functionally transparent. — Pascal Cuoq, Aug 07 '16 at 21:36
This does not apply to Clang, which (last time I tried) does not implement C99's `FLT_EVAL_METHOD==2` correctly when passed `-mfpmath=387` but instead some sort of pre-C99 under-specified mess (http://stackoverflow.com/a/18989507/139746 ). Best never use `-mfpmath=387` since Clang's developers are not interested in making that mode work according to the standard (not long ago, it defined `FLT_EVAL_METHOD` as 0, that's how much they care). — Pascal Cuoq, Aug 07 '16 at 21:43

Compiler flags for enhancing the precision of intermediate floating-point calculation

1 Answers1