Generic way of handling fused-multiply-add floating-point inaccuracies

Question

Yesterday I was tracking a bug in my project, which - after several hours - I've narrowed down to a piece of code which more or less was doing something like this:

#include <iostream>
#include <cmath>
#include <cassert>

volatile float r = -0.979541123;
volatile float alpha = 0.375402451;

int main()
{
    float sx = r * cosf(alpha); // -0.911326
    float sy = r * sinf(alpha); // -0.359146
    float ex = r * cosf(alpha); // -0.911326
    float ey = r * sinf(alpha); // -0.359146
    float mx = ex - sx;     // should be 0
    float my = ey - sy;     // should be 0
    float distance = sqrtf(mx * mx + my * my) * 57.2958f;   // should be 0, gives 1.34925e-06

//  std::cout << "sv: {" << sx << ", " << sy << "}" << std::endl;
//  std::cout << "ev: {" << ex << ", " << ey << "}" << std::endl;
//  std::cout << "mv: {" << mx << ", " << my << "}" << std::endl;
    std::cout << "distance: " << distance << std::endl;

    assert(distance == 0.f);
//  assert(sx == ex && sy == ey);
//  assert(mx == 0.f && my == 0.f);
}

After compilation and execution:

$ g++ -Wall -Wextra -Wshadow -march=native -O2 vfma.cpp && ./a.out 
distance: 1.34925e-06
a.out: vfma.cpp:23: int main(): Assertion `distance == 0.f' failed.
Aborted (core dumped)

From my point of view something is wrong, as I've asked for 2 subtractions of two bitwise-identical pairs (I expected to get two zeroes), then squaring them (two zeroes again) and adding them together (zero).

It turns out that the root cause of problem is the use of fused-multiply-add operation, which somewhere along the line makes the result inexact (from my point of view). Generally I have nothing against this optimization, as it promises to give results which are more exact, but in this case 1.34925e-06 is really far from the 0 that I was expecting.

The test case is very "fragile" - if you enable more prints or more asserts, it stops asserting, because compiler doesn't use fused-multiply-add anymore. For example if I uncomment all lines:

$ g++ -Wall -Wextra -Wshadow -march=native -O2 vfma.cpp && ./a.out 
sv: {-0.911326, -0.359146}
ev: {-0.911326, -0.359146}
mv: {0, 0}
distance: 0

As I've considered this to be a bug in the compiler, I've reported that, but it got closed with the explanation that this is correct behaviour.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79436

So I'm wondering - how should one code such calculations to avoid the problem? I was thinking about a generic solution, but something better than:

mx = ex != sx ? ex - sx : 0.f;

I would like to fix or improve my code - if there's anything to fix/improve - instead of setting -ffp-contract=off for my whole project, as fused-multiply-add is used internally in the compiler libraries anyway (I see a lot of that in sinf() and cosf()), so it would be a "partial work-around", not a solution... I would also like to avoid solutions like "don't use floating-point" (;

Floating point numbers/arithmetic is potentially inexact. This is a well-known "feature" of floating-point arithmetic. — DisappointedByUnaccountableMod, Feb 09 '17 at 09:10
@barny - I know, but for things like subtracting two identical numbers or multiplying anything by zero floating-point arithmetic was perfectly accurate. "Was" - because with fused-multiply-add it's no longer the case... Also I think that the scale of error here is quite big. If I would get sth like 1e-64 then I would not ask this question... — Freddie Chopin, Feb 09 '17 at 09:20
GCC (and I think ICC as well) contracts but Clang does not by default. I [asked a question about this](http://stackoverflow.com/q/34436233/2542702) because I was surprised that GCC does this. Apparently, many people did not expect this either. It turns out it does not violate IEEE so GCC is still conforming to do this. — Z boson, Feb 14 '17 at 10:07
I think there are two possible solutions to consider. 1.) Only use FMA explicitly. This means you compile with `-ffp-contract=off -mfma` and then use `fma` functions or intrinsics to get FMA only when you want it. 2.) Design your code so it deals with floating point errors with and without FMA operations so that it's not sensitive to FMA operations. — Z boson, Feb 14 '17 at 10:47
You could add `float mx_fma = fmaf(r, cosf(alpha), -r*cosf(alpha))` to your tests in your question. It should produce the same results. You could then compile with `-ffp-contract=off` and see what you get. You probably won't learn anything that you don't expect from this but nevertheless I think it's interesting to try. — Z boson, Feb 14 '17 at 13:16

Simon Byrne · Answer 1 · 2017-02-09T11:02:09.510

4

In general no: this is exactly the price you pay for using -ffp-contract=fast (coincidently, it is precisely this example that William Kahan notes in the problems with automatic contraction)

Theoretically, if you were using C (not C++), and your compiler supported C-1999 pragmas (i.e. not gcc), you could use

#pragma STDC FP_CONTRACT OFF
// non-contracted code
#pragma STDC FP_CONTRACT ON

edited Feb 09 '17 at 11:02

answered Feb 09 '17 at 10:31

Simon Byrne

7,694
1
26
50

1

I can disable contraction for whole file with `-ffp-contract=off` or implement any form of work-around, but the problem is that searching for bugs like this is pretty long. That's why I'm wondering whether there's a way of avoiding the problem in the first place. – Freddie Chopin Feb 09 '17 at 20:35

score 3 · Answer 2 · answered Feb 09 '17 at 17:44

3

Interestingly, thanks to fma, the floats mx and my gives you the rounding error that was made when multiplying r and cos.

fma( r,cos, -r*cos) = theoretical(r*cos) - float(r*cos)

So the result you get somehow indicates how far was the computed (sx,sy) from the theoretical (sx,sy), due to multiplication of floats (but not accounting from rounding errors in computations of cos and sin).

So the question is how can your program rely on a difference (ex-sx,ey-sy) which is within the uncertainty interval related to floating point rounding?

answered Feb 09 '17 at 17:44

aka.nice

9,100
1
28
40

Well, the code does not care that much about precision of calculations down to the single bit, but in this particular case it really cares whether it gets zero or "something else". That's because this distance is then used in calculation of time and instead of getting a "0 length move in 0 time" I get "very small move in some absurd time". – Freddie Chopin Feb 09 '17 at 20:39

Foogod · Answer 3 · 2019-02-21T01:00:05.210

I can see this question has been around for a while, but in case other folks come across it looking for an answer I figured I'd mention a couple of points..

First, it's difficult to tell exactly without analyzing the resulting assembly code, but I suspect the reason that the FMA gives a result that is so far outside expectations is not just the FMA itself, but also that you are assuming all of the calculations are being done in the order you have specified them, but with optimizing C/C++ compilers this is often not the case. This is also likely why uncommenting the print statements changes the results.

If mx and my were being calculated as the comments suggest, then even if the final mx*mx + my*my were done with an FMA, it would still result in the expected 0 result. The problem is that since none of the sx/sy/ex/ey/mx/my variables are used by anything else, there is a good possibility that the compiler never actually evaluates them as independent variables at all, and simply mushes all the math together into a big mass of multiplications, adds, and subtracts to calculate distance in one single step, which can then be represented any number of different ways in machine code (in any order, potentially with multiple FMAs, etc) however it figures it will get the best performance for that one big calculation.

However, if something else (like a print statement) references mx and my, then it's much more likely the compiler will calculate them separately, before calculating distance as a second step. In that case, the math does work out the way the comments suggest, and even an FMA in the final distance calculation doesn't change the results (because the inputs are all exactly 0).

The Answer

But that doesn't actually answer the real question. In answer to that, the most robust (and generally recommended) way to avoid this sort of problem in general is: Never assume floating point operations will ever produce an exact number, even if that number is 0. This means that, in general, it's a bad idea to ever use == to compare floating point numbers. Instead, you should choose a small number (often referred to as epsilon), which is larger than any possible/likely accumulated error, but still smaller than any significant result (for example, if you know that the distances you care about are only really significant to a couple of decimal places, then you could choose EPSILON = 0.01, which will mean "any difference less than 0.01 we'll consider to be the same as zero"). Then, instead of saying:

assert(distance == 0.f);

you would say:

assert(distance < EPSILON);

(the exact value for your epsilon will likely depend on the application, and may even be different for different types of calculations, of course)

Likewise, instead of saying something like if (a == b) for floating point numbers, you would instead say something like if (abs(a - b) < EPSILON), etc.

Another way to reduce (but not necessarily eliminate) this problem is to implement "fail-fast" logic in your application. For example, in the above code, instead of going all the way through and calculating distance and then seeing if it's 0 at the end, you could "short-circuit" some of the math by testing if (mx < EPSILON && my < EPSILON) before you even get to the point of calculating distance and skipping the rest if they're both zero (since you know the result will be zero in that case). The quicker you catch the situation the less opportunity there is for errors to accumulate (and sometimes you can also avoid doing some more costly calculations in cases you don't need to).

So you can't convert a float to an int, ever? Or do any boolean test, ever? It seems like fp math is broken and shouldn't be used for any serious purpose, ever. — curiousguy, Feb 21 '19 at 01:20

Generic way of handling fused-multiply-add floating-point inaccuracies

3 Answers3