Is there a document describing how Clang handles excess floating-point precision?

Question

It is nearly impossible(*) to provide strict IEEE 754 semantics at reasonable cost when the only floating-point instructions one is allowed to used are the 387 ones. It is particularly hard when one wishes to keep the FPU working on the full 64-bit significand so that the long double type is available for extended precision. The usual “solution” is to do intermediate computations at the only available precision, and to convert to the lower precision at more or less well-defined occasions.

Recent versions of GCC handle excess precision in intermediate computations according to the interpretation laid out by Joseph S. Myers in a 2008 post to the GCC mailing list. This description makes a program compiled with gcc -std=c99 -mno-sse2 -mfpmath=387 completely predictable, to the last bit, as far as I understand. And if by chance it doesn't, it is a bug and it will be fixed: Joseph S. Myers' stated intention in his post is to make it predictable.

Is it documented how Clang handles excess precision (say when the option -mno-sse2 is used), and where?

(*) EDIT: this is an exaggeration. It is slightly annoying but not that difficult to emulate binary64 when one is allowed to configure the x87 FPU to use a 53-bit significand.

Following a comment by R.. below, here is the log of a short interaction of mine with the most recent version of Clang I have :

Hexa:~ $ clang -v
Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn)
Target: x86_64-apple-darwin12.4.0
Thread model: posix
Hexa:~ $ cat fem.c
#include <stdio.h>
#include <math.h>
#include <float.h>
#include <fenv.h>

double x;
double y = 2.0;
double z = 1.0;

int main(){
  x = y + z;
  printf("%d\n", (int) FLT_EVAL_METHOD);
}
Hexa:~ $ clang -std=c99 -mno-sse2 fem.c
Hexa:~ $ ./a.out 
0
Hexa:~ $ clang -std=c99 -mno-sse2 -S fem.c
Hexa:~ $ cat fem.s 
…
    movl    $0, %esi
    fldl    _y(%rip)
    fldl    _z(%rip)
    faddp   %st(1)
    movq    _x@GOTPCREL(%rip), %rax
    fstpl   (%rax)
…

I don’t *think* that there’s an official policy; `-no-sse2` is not an option of interest to most of the main clang devs. cfe-dev is probably the right place to ask this question. — Stephen Canon, Jul 15 '13 at 21:08
Formally, I believe the C standard requires the modern GCC behavior if `FLT_EVAL_METHOD` is defined as 2. Other strange excess-precision variants would require a negative value of `FLT_EVAL_METHOD`. However, this doesn't necessarily mean clang conforms... — R.. GitHub STOP HELPING ICE, Jul 15 '13 at 21:17
@R.. I did not even think of checking whether `FLT_EVAL_METHOD` followed my command-line instructions, I only looked at the assembly. That kind of answers my question (I mean, if it was defined as negative, that would be all the more reason to document how excess precision works in Clang. On the other hand, if the developers didn't bother to set a value other than the default of 0, it means they mostly do not care). — Pascal Cuoq, Jul 15 '13 at 21:30
@PascalCuoq If you haven't already, digging through the header files and macro definitions confirms that `FLT_EVAL_METHOD` is set to 0 as a default. — charmlessCoin, Aug 17 '13 at 20:19
@charmlessCoin There has been some work on having the right macro definition two years ago: https://github.com/llvm-mirror/clang/commit/b406669fea7c8db83a377f368f1689c848296974 Now to understand what the current status is. — Pascal Cuoq, Aug 22 '13 at 13:45
Anyone know whether clang's `--disable-excess-fp-precision` is equivalent to gcc's `-fexcess-precision=standard`? — letmaik, Jul 31 '18 at 15:34

Nominal Animal · Answer 1 · 2013-09-26T23:27:48.703

This does not answer the originally posed question, but if you are a programmer working with similar issues, this answer might help you.

I really don't see where the perceived difficulty is. Providing strict IEEE-754 binary64 semantics while being limited to 80387 floating-point math, and retaining 80-bit long double computation, seems to follow well-specified C99 casting rules with both GCC-4.6.3 and clang-3.0 (based on LLVM 3.0).

Edited to add: Yet, Pascal Cuoq is correct: neither gcc-4.6.3 or clang-llvm-3.0 actually enforce those rules correctly for '387 floating-point math. Given the proper compiler options, the rules are correctly applied to expressions evaluated at compile time, but not for run-time expressions. There are workarounds, listed after the break below.

I do molecular dynamics simulation code, and am very familiar with the repeatability/predictability requirements and also with the desire to retain maximum precision available when possible, so I do claim I know what I am talking about here. This answer should show that the tools exist and are simple to use; the problems arise from not being aware of or not using those tools.

(A preferred example I like, is the Kahan summation algorithm. With C99 and proper casting (adding casts to e.g. Wikipedia example code), no tricks or extra temporary variables are needed at all. The implementation works regardless of compiler optimization level, including at -O3 and -Ofast.)

C99 explicitly states (in e.g. 5.4.2.2) that casting and assignment both remove all extra range and precision. This means that you can use long double arithmetic by defining your temporary variables used during computation as long double, also casting your input variables to that type; whenever a IEEE-754 binary64 is needed, just cast to a double.

On '387, the cast generates an assignment and a load on both the above compilers; this does correctly round the 80-bit value to IEEE-754 binary64. This cost is very reasonable in my opinion. The exact time taken depends on the architecture and surrounding code; usually it is and can be interleaved with other code to bring the cost down to neglible levels. When MMX, SSE or AVX are available, their registers are separate from the 80-bit 80387 registers, and the cast usually is done by moving the value to the MMX/SSE/AVX register.

(I prefer production code to use a specific floating-point type, say tempdouble or such, for temporary variables, so that it can be defined to either double or long double depending on architecture and speed/precision tradeoffs desired.)

In a nutshell:

Don't assume (expression) is of double precision just because all the variables and literal constants are. Write it as (double)(expression) if you want the result at double precision.

This applies to compound expressions, too, and may sometimes lead to unwieldy expressions with many levels of casts.

If you have expr1 and expr2 that you wish to compute at 80-bit precision, but also need the product of each rounded to 64-bit first, use

long double  expr1;
long double  expr2;
double       product = (double)(expr1) * (double)(expr2);

Note, product is computed as a product of two 64-bit values; not computed at 80-bit precision, then rounded down. Calculating the product at 80-bit precision, then rounding down, would be

double       other = expr1 * expr2;

or, adding descriptive casts that tell you exactly what is happening,

double       other = (double)((long double)(expr1) * (long double)(expr2));

It should be obvious that product and other often differ.

The C99 casting rules are just another tool you must learn to wield, if you do work with mixed 32-bit/64-bit/80-bit/128-bit floating point values. Really, you encounter the exact same issues if you mix binary32 and binary64 floats (float and double on most architectures)!

Perhaps rewriting Pascal Cuoq's exploration code, to correctly apply casting rules, makes this clearer?

#include <stdio.h>

#define TEST(eq) printf("%-56s%s\n", "" # eq ":", (eq) ? "true" : "false")

int main(void)
{
    double d = 1.0 / 10.0;
    long double ld = 1.0L / 10.0L;

    printf("sizeof (double) = %d\n", (int)sizeof (double));
    printf("sizeof (long double) == %d\n", (int)sizeof (long double));

    printf("\nExpect true:\n");
    TEST(d == (double)(0.1));
    TEST(ld == (long double)(0.1L));
    TEST(d == (double)(1.0 / 10.0));
    TEST(ld == (long double)(1.0L / 10.0L));
    TEST(d == (double)(ld));
    TEST((double)(1.0L/10.0L) == (double)(0.1));
    TEST((long double)(1.0L/10.0L) == (long double)(0.1L));

    printf("\nExpect false:\n");
    TEST(d == ld);
    TEST((long double)(d) == ld);
    TEST(d == 0.1L);
    TEST(ld == 0.1);
    TEST(d == (long double)(1.0L / 10.0L));
    TEST(ld == (double)(1.0L / 10.0));

    return 0;
}

The output, with both GCC and clang, is

sizeof (double) = 8
sizeof (long double) == 12

Expect true:
d == (double)(0.1):                                     true
ld == (long double)(0.1L):                              true
d == (double)(1.0 / 10.0):                              true
ld == (long double)(1.0L / 10.0L):                      true
d == (double)(ld):                                      true
(double)(1.0L/10.0L) == (double)(0.1):                  true
(long double)(1.0L/10.0L) == (long double)(0.1L):       true

Expect false:
d == ld:                                                false
(long double)(d) == ld:                                 false
d == 0.1L:                                              false
ld == 0.1:                                              false
d == (long double)(1.0L / 10.0L):                       false
ld == (double)(1.0L / 10.0):                            false

except that recent versions of GCC promote the right hand side of ld == 0.1 to long double first (i.e. to ld == 0.1L), yielding true, and that with SSE/AVX, long double is 128-bit.

For the pure '387 tests, I used

gcc -W -Wall -m32 -mfpmath=387 -mno-sse ... test.c -o test
clang -W -Wall -m32 -mfpmath=387 -mno-sse ... test.c -o test

with various optimization flag combinations as ..., including -fomit-frame-pointer, -O0, -O1, -O2, -O3, and -Os.

Using any other flags or C99 compilers should lead to the same results, except for long double size (and ld == 1.0 for current GCC versions). If you encounter any differences, I'd be very grateful to hear about them; I may need to warn my users of such compilers (compiler versions). Note that Microsoft does not support C99, so they are completely uninteresting to me.

Pascal Cuoq does bring up an interesting problem in the comment chain below, which I didn't immediately recognize.

When evaluating an expression, both GCC and clang with -mfpmath=387 specify that all expressions are evaluated using 80-bit precision. This leads to for example

7491907632491941888 = 0x1.9fe2693112e14p+62 = 110011111111000100110100100110001000100101110000101000000000000
5698883734965350400 = 0x1.3c5a02407b71cp+62 = 100111100010110100000001001000000011110110111000111000000000000

7491907632491941888 * 5698883734965350400 = 42695510550671093541385598890357555200 = 100000000111101101101100110001101000010100100001011110111111111111110011000111000001011101010101100011000000000000000000000000

yielding incorrect results, because that string of ones in the middle of the binary result is just at the difference between 53- and 64-bit mantissas (64 and 80-bit floating point numbers, respectively). So, while the expected result is

42695510550671088819251326462451515392 = 0x1.00f6d98d0a42fp+125 = 100000000111101101101100110001101000010100100001011110000000000000000000000000000000000000000000000000000000000000000000000000

the result obtained with just -std=c99 -m32 -mno-sse -mfpmath=387 is

42695510550671098263984292201741942784 = 0x1.00f6d98d0a43p+125 = 100000000111101101101100110001101000010100100001100000000000000000000000000000000000000000000000000000000000000000000000000000

In theory, you should be able to tell gcc and clang to enforce the correct C99 rounding rules by using options

-std=c99 -m32 -mno-sse -mfpmath=387 -ffloat-store -fexcess-precision=standard

However, this only affects expressions the compiler optimizes, and does not seem to fix the 387 handling at all. If you use e.g. clang -O1 -std=c99 -m32 -mno-sse -mfpmath=387 -ffloat-store -fexcess-precision=standard test.c -o test && ./test with test.c being Pascal Cuoq's example program, you will get the correct result per IEEE-754 rules -- but only because the compiler optimizes away the expression, not using the 387 at all.

Simply put, instead of computing

(double)d1 * (double)d2

both gcc and clang actually tell the '387 to compute

(double)((long double)d1 * (long double)d2)

~~This is indeed~~ I believe this is a compiler bug affecting both gcc-4.6.3 and clang-llvm-3.0, and an easily reproduced one. (Pascal Cuoq points out that FLT_EVAL_METHOD=2 means operations on double-precision arguments is always done at extended precision, but I cannot see any sane reason -- aside from having to rewrite parts of libm on '387 -- to do that in C99 and considering IEEE-754 rules are achievable by the hardware! After all, the correct operation is easily achievable by the compiler, by modifying the '387 control word to match the precision of the expression. And, given the compiler options that should force this behaviour -- -std=c99 -ffloat-store -fexcess-precision=standard -- make no sense if FLT_EVAL_METHOD=2 behaviour is actually desired, there is no backwards compatibility issues, either.) It is important to note that given the proper compiler flags, expressions evaluated at compile time do get evaluated correctly, and that only expressions evaluated at run time get incorrect results.

The simplest workaround, and the portable one, is to use fesetround(FE_TOWARDZERO) (from fenv.h) to round all results towards zero.

In some cases, rounding towards zero may help with predictability and pathological cases. In particular, for intervals like x = [0,1), rounding towards zero means the upper limit is never reached through rounding; important if you evaluate e.g. piecewise splines.

For the other rounding modes, you need to control the 387 hardware directly.

You can use either __FPU_SETCW() from #include <fpu_control.h>, or open-code it. For example, precision.c:

#include <stdlib.h>
#include <stdio.h>
#include <limits.h>

#define FP387_NEAREST   0x0000
#define FP387_ZERO      0x0C00
#define FP387_UP        0x0800
#define FP387_DOWN      0x0400

#define FP387_SINGLE    0x0000
#define FP387_DOUBLE    0x0200
#define FP387_EXTENDED  0x0300

static inline void fp387(const unsigned short control)
{
    unsigned short cw = (control & 0x0F00) | 0x007f;
    __asm__ volatile ("fldcw %0" : : "m" (*&cw));
}

const char *bits(const double value)
{
    const unsigned char *const data = (const unsigned char *)&value;
    static char buffer[CHAR_BIT * sizeof value + 1];
    char       *p = buffer;
    size_t      i = CHAR_BIT * sizeof value;
    while (i-->0)
        *(p++) = '0' + !!(data[i / CHAR_BIT] & (1U << (i % CHAR_BIT)));
    *p = '\0';
    return (const char *)buffer;
}


int main(int argc, char *argv[])
{
    double  d1, d2;
    char    dummy;

    if (argc != 3) {
        fprintf(stderr, "\nUsage: %s 7491907632491941888 5698883734965350400\n\n", argv[0]);
        return EXIT_FAILURE;
    }

    if (sscanf(argv[1], " %lf %c", &d1, &dummy) != 1) {
        fprintf(stderr, "%s: Not a number.\n", argv[1]);
        return EXIT_FAILURE;
    }
    if (sscanf(argv[2], " %lf %c", &d2, &dummy) != 1) {
        fprintf(stderr, "%s: Not a number.\n", argv[2]);
        return EXIT_FAILURE;
    }

    printf("%s:\td1 = %.0f\n\t    %s in binary\n", argv[1], d1, bits(d1));
    printf("%s:\td2 = %.0f\n\t    %s in binary\n", argv[2], d2, bits(d2));

    printf("\nDefaults:\n");
    printf("Product = %.0f\n\t  %s in binary\n", d1 * d2, bits(d1 * d2));

    printf("\nExtended precision, rounding to nearest integer:\n");
    fp387(FP387_EXTENDED | FP387_NEAREST);
    printf("Product = %.0f\n\t  %s in binary\n", d1 * d2, bits(d1 * d2));

    printf("\nDouble precision, rounding to nearest integer:\n");
    fp387(FP387_DOUBLE | FP387_NEAREST);
    printf("Product = %.0f\n\t  %s in binary\n", d1 * d2, bits(d1 * d2));

    printf("\nExtended precision, rounding to zero:\n");
    fp387(FP387_EXTENDED | FP387_ZERO);
    printf("Product = %.0f\n\t  %s in binary\n", d1 * d2, bits(d1 * d2));

    printf("\nDouble precision, rounding to zero:\n");
    fp387(FP387_DOUBLE | FP387_ZERO);
    printf("Product = %.0f\n\t  %s in binary\n", d1 * d2, bits(d1 * d2));

    return 0;
}

Using clang-llvm-3.0 to compile and run, I get the correct results,

clang -std=c99 -m32 -mno-sse -mfpmath=387 -O3 -W -Wall precision.c -o precision
./precision 7491907632491941888 5698883734965350400

7491907632491941888:    d1 = 7491907632491941888
        0100001111011001111111100010011010010011000100010010111000010100 in binary
5698883734965350400:    d2 = 5698883734965350400
        0100001111010011110001011010000000100100000001111011011100011100 in binary

Defaults:
Product = 42695510550671098263984292201741942784
          0100011111000000000011110110110110011000110100001010010000110000 in binary

Extended precision, rounding to nearest integer:
Product = 42695510550671098263984292201741942784
          0100011111000000000011110110110110011000110100001010010000110000 in binary

Double precision, rounding to nearest integer:
Product = 42695510550671088819251326462451515392
          0100011111000000000011110110110110011000110100001010010000101111 in binary

Extended precision, rounding to zero:
Product = 42695510550671088819251326462451515392
          0100011111000000000011110110110110011000110100001010010000101111 in binary

Double precision, rounding to zero:
Product = 42695510550671088819251326462451515392
          0100011111000000000011110110110110011000110100001010010000101111 in binary

In other words, you can work around the compiler issues by using fp387() to set the precision and rounding mode.

The downside is that some math libraries (libm.a, libm.so) may be written with the assumption that intermediate results are always computed at 80-bit precision. At least the GNU C library fpu_control.h on x86_64 has the comment "libm requires extended precision". Fortunately, you can take the '387 implementations from e.g. GNU C library, and implement them in a header file or write a known-to-work libm, if you need the math.h functionality; in fact, I think I might be able to help there.

“This answer should show that the tools exist and are simple to use; the problems arise from not being aware of or not using those tools.” You are answering a different question than mine. My question is how to to take an existing, defined program and predict its behavior when compiled with `-mfpmath=387 -mno-sse`. If you claim that you know how to do that, please take the program in my answer **as written**. As written, it shows behavior inconsistent with any positive value of FLT_EVAL_METHOD and inconsistent with itself at different optimization levels. Don't “fix” the program. — Pascal Cuoq, Sep 25 '13 at 06:46
“`product` is computed as a product of two 64-bit values; not computed at 80-bit precision, then rounded down” Actually, with FLT_EVAL_METHOD=2, computing the product at 80-bit precision and rounding it down is exactly what the compiler does. The difference is visible when double-rounding happens. — Pascal Cuoq, Sep 25 '13 at 06:48
Automatically generated counter-example to your `product` claim: 0x1.9fe2693112e14p+62*0x1.3c5a02407b71cp+62=0x1.00f6d98d0a42fp+125 but (double)((loung double) 0x1.9fe2693112e14p+62 * 0x1.3c5a02407b71cp+62))=0x1.00f6d98d0a43p+125. You may be interested to hear that I identified a bug in each of the default compilers I was using in generating this. — Pascal Cuoq, Sep 25 '13 at 07:24
Specifically, the counter-example is `double d1 = 0x1.9fe2693112e14p+62; double d2 = 0x1.3c5a02407b71cp+62; double product = (double)(d1) * (double)(d2);` which follows the pattern of your answer but shows that an 80-bit product took place when compiled with `clang -m32 -mno-sse`. — Pascal Cuoq, Sep 25 '13 at 07:41
By the way, 128 bits is the size `long double` takes in memory with some x86 compilation platforms but has nothing to do with the underlying floating-point representation, which is still the x87's 80-bit representation. The rest is just padding. — Pascal Cuoq, Sep 25 '13 at 08:21
@PascalCuoq: I thought the first paragraph in your question was the point. I only meant that C99 does provide the programming tools. I don't care what pre-C99/C11 rules each compiler happens to implement, as I rely on the C99 and later rules. Can you verify your counter-examples with `-std=C99`? — Nominal Animal, Sep 25 '13 at 09:28
@PascalCuoq: Make that `-std=c99`, of course. Also, GCC does support software-assisted/emulated `__float128` even on architectures where `long double` is just 80-bit. Need an example you can verify it for yourself? Some non-x86 architectures do have "real" 128-bit `long doubles` (ISTR POWER uses twinned binary64's, where the mantissas are concatenated but the other exponent ignored?). — Nominal Animal, Sep 25 '13 at 09:39
I have already compiled the counter-examples in my answer with `clang -mno-sse2 -std=c99` and shown the results there. The results are inconsistent between optimization levels and with C99 rules for all positive values of FLT_EVAL_METHOD. Are you arguing that I didn't, or that your version of Clang does not produce the same results (and produce coherent results), or what? — Pascal Cuoq, Sep 25 '13 at 10:59
Regarding `product`, here is a C99 compiler with FLT_EVAL_METHOD=2. My example plainly shows that your statement “product is computed as a product of two 64-bit values; not computed at 80-bit precision, then rounded down” is wrong. http://ideone.com/Xfwemg — Pascal Cuoq, Sep 25 '13 at 11:06
@PascalCuoq: You're right: both gcc-4.6.3 and clang-llvm-3.0 do not enforce correct rounding rules for '387 runtime expressions; it is a compiler bug (apparently required by the GNU C math library, `libm`; GNU C library `fpu_control.h` says `libm` requires this behaviour!) Any comments on the workarounds (and problem description) I added to my answer? — Nominal Animal, Sep 25 '13 at 21:48
I do not think that computing `(double)((double)e1 * (double)e2)` as `(double)((long double)(double)e1 * (long double)(double)e2)` is a bug. For me (and Joseph S. Myers) it is what FLT_EVAL_METHOD=2 means: you can cast the operands so that they retain only `double` precision, you can cast the result, but the only available multiplication is `long double` multiplication. Now, my question was not technically about the meaning of FLT_EVAL_METHOD, for which I do not see a reason to disagree with Myers' interpretation, but about Clang, which has no excuse to compute my `r1` and `r5` differently. — Pascal Cuoq, Sep 25 '13 at 21:58
Thanks for your rewritten answer with lots of technical details. My question came to me originally while writing two blog posts: http://blog.frama-c.com/index.php?post/2013/07/06/On-the-precise-analysis-of-C-programs-for-FLT_EVAL_METHOD-2 and http://blog.frama-c.com/index.php?post/2013/07/24/More-on-FLT_EVAL_METHOD_2 . In a static analysis context, one cannot rewrite the program for its meaning to be more obvious: one has to predict what it does as it is written. However, the conclusion of this reflexion is that we are going to continue not to support `long double`, or anyway, not precisely. — Pascal Cuoq, Sep 25 '13 at 22:13
@PascalCuoq, wrt. `FLT_EVAL_METHOD=2` and C99 casting rules: you might be right: I'm not sure if/how the cast should affect the `*` operator. At least above `fp387()` or `_FPU_SETCW()` does affect it, so us programmers do have a workaround. As to clang, I agree: it is clearly *buggy* here. GNU C `libm` seems written with `FLT_EVAL_METHOD=2` (gcc-4.6.3 defining `__FLT_EVAL_METHOD__=2`), but `-ffloat-store -fexcess-precision=standard -std=c99` causes compile-time expressions to be evaluated at double precision only. So, it is workable if writing new code, but definitely not for static analysis. — Nominal Animal, Sep 25 '13 at 23:01
There could be advantages to specifying that all computations should be performed by extending to 80 bits and then converting back; even in cases where it would yield a worst-case rounding error of 1029/2048LSB vs 1/2LSB, there would be considerable advantages to being able to say that `x=a+b+c` is equivalent to `long double temp=(a+b); x=temp+c;` even if a-c are shorter types. There are many cases where 80-bit intermediate results will offer major advantages in precision, and where exploiting such precision can offer major advantages in speed; I wish such semantics were still the norm. — supercat, Oct 14 '14 at 12:37

score 5 · Answer 2 · answered Sep 24 '13 at 18:41

For the record, below is what I found by experimentation. The following program shows various behaviors when compiled with Clang:

#include <stdio.h>

int r1, r2, r3, r4, r5, r6, r7;

double ten = 10.0;

int main(int c, char **v)
{
  r1 = 0.1 == (1.0 / ten);
  r2 = 0.1 == (1.0 / 10.0);
  r3 = 0.1 == (double) (1.0 / ten);
  r4 = 0.1 == (double) (1.0 / 10.0);
  ten = 10.0;
  r5 = 0.1 == (1.0 / ten);
  r6 = 0.1 == (double) (1.0 / ten);
  r7 = ((double) 0.1) == (1.0 / 10.0);
  printf("r1=%d r2=%d r3=%d r4=%d r5=%d r6=%d r7=%d\n", r1, r2, r3, r4, r5, r6, r7);
}

The results vary with the optimization level:

$ clang -v
Apple LLVM version 4.2 (clang-425.0.24) (based on LLVM 3.2svn)
$ clang -mno-sse2 -std=c99  t.c && ./a.out
r1=0 r2=1 r3=0 r4=1 r5=1 r6=0 r7=1
$ clang -mno-sse2 -std=c99 -O2  t.c && ./a.out
r1=0 r2=1 r3=0 r4=1 r5=1 r6=1 r7=1

The cast (double) that differentiates r5 and r6 at -O2 has no effect at -O0 and for variables r3 and r4. The result r1 is different from r5 at all optimization levels, whereas r6 only differs from r3 at -O2.

Is there a document describing how Clang handles excess floating-point precision?

2 Answers2

Linked