How to discard unwanted extra precision from floating point computations?

Question

First, I'll just say that I know floating point calculations are approximate - subject to rounding errors - so normally you can't test for exact equality and expect it to work. ~~However, floating point calculations are still deterministic - run the same code with the same inputs and you should get the same result.~~ [see edit at end] I've just had a surprise where that didn't work out.

I'm writing a utility to extract some information from Photoshop PSD files. Paths containing cubic Bezier curves are part of that, and I needed to compute axis-aligned bounding boxes for Bezier curves. For the theory, I found A Primer on Bezier Curves via another SO question.

Quick summary of method...

Determine the derivative of the cubic bezier - a quadratic bezier.
Use the quadratic formula to solve the quadratic for each dimension, giving the parameter values for the stopping points (including maxima and minima) of the curve.
Evaluate the cubic bezier positions at those parameters (and the start and end points), expanding the bounding box.
Because I want the actual on-the-boundary points as well as the bounding box, make a second pass, computing positions and rejecting those points that aren't on the final bounding box.

The fourth step was the problem. Expanding the bounding box shouldn't change floating-point values - it just selects largest/smallest values and stores them. I recompute the same points using the same curve control points and the same parameter, but comparing for exact equality with the bounding box failed where it should pass.

Adding some debug output made it work. Removing the debug code but compiling for debug mode made it work. No possibility of memory corruption.

I figured that the stored value (in the form of the bounding box) was somehow lower precision than the newly recomputed position, and that seems to be correct. When I add debug code or compile in debug mode, some inlining doesn't happen, less optimization happens, and as a result the newly computed values get the same precision-loss that the stored bounding box boundary has.

In the end, I resolved the problem by using a volatile variable. The newly recomputed value gets written into the volatile variable, then read back out, thus forcing the compiler to compare a precision-reduced stored version with another precision-reduced stored version. That seems to work.

However, I really have no idea whether that should work - it was just an idea and I know how sensitive things like that can be to compiler-writers interpretation of technicalities in the standard. I don't know precisely what relevant guarantees the standard makes, and I don't know if there's a conventional solution to this issue. I'm not even sure what inspired me to try volatile which IIRC I've not used before since I was working in pure C on embedded systems around 1996-ish.

Of course I could compute the set of position vectors once and store them ready for both the bounding rectangle and the filtering, but I'm curious about this specific issue. As you may guess, I don't do much floating point work, so I'm not that familiar with the issues.

So - is this use of volatile correct? Is it, as I suspect, unnecessarily inefficient? Is there an idiomatic way to force extra precision (beyond the limits of the type) to be discarded?

Also, is there a reason why comparing for exact equality doesn't force the precision of the type first? Is comparing for exact equality with the extra precision (presumably unreliably) preserved ever a useful thing to do?

[EDIT - following Hans Passants first comment, I've given some more thought to what I said about inlining above. Clearly, two different calls to the same code can be optimized differently due to different inlining decisions. That's not just at one level - it can happen at any depth of inlining functions into a piece of code. That means the precision reduction could happen pretty much anywhere, which really means that even when the same source code is used with the same inputs it can give different results.

The FPU is deterministic, so presumably any particular target code is deterministic, but two different calls to the same function may not use the same target code for that function. Now I'm feeling a bit of an idiot, not seeing the implications of what I already figured out - oh well. If there's no better answer in the next few days, I'll add one myself.]

Clearly your assumption that "floating point calculations are still deterministic" was incorrect. It was, the code optimizer blows raspberries at it. Artificial ways to stop it from optimizing and leave values with increased precision stored in the FPU indeed have side-effects. Today, not tomorrow. Time to update your compiler btw. — Hans Passant, Dec 23 '14 at 00:08
Comparing without forcing the precision means the floating point register doesn't need to be reloaded from memory, which can mean quite a savings in a tight loop. C++ is built for speed after all. — Mark Ransom, Dec 23 '14 at 00:12
@Mark - understood, but that doesn't mean there can't be a way to explicitly ask for the extra precision to be discarded when needed. There will certainly be some cost for doing that, but not necessarily (as with this `volatile` variable solution) presumably an unnecessary trip to main memory and back. — , Dec 23 '14 at 00:15
The extra precision is an implementation detail, you're not supposed to notice it or rely on it. — Mark Ransom, Dec 23 '14 at 00:18
@Mark - I'm not trying to rely on that extra precision - exactly the opposite, in fact. I'm trying to explicitly demand that a value of type `float` should have the precision of a value of type `float`. I'm trying to demand that any extra precision, if it exists, is discarded. The whole point is to help me avoid noticing it. — , Dec 23 '14 at 00:26
I think gcc support -fstrict-math to force rounding to the IEEE-754 representation between computations. With that you should get reproducible results. Of course, the C++ standard has no mandated that IEEE-754 is used and delegates any discussion of precision to the underlying implementation of the floating point math: between platforms the results can certainly differ... — Dietmar Kühl, Dec 23 '14 at 00:45
@Mark - never mind - the penny has started to drop, as the edit hopefully shows. — , Dec 23 '14 at 00:48

score 3 · Answer 1 · edited Dec 24 '14 at 03:50

Adding some debug output made it work. Removing the debug code but compiling for debug mode made it work. No possibility of memory corruption.

You seem to be having the sort of trouble described at length by David Monniaux in the article “The pitfalls of verifying floating-point computations”:

The above examples indicate that common debugging practises that apparently should not change the computational semantics may actually alter the result of computations. Adding a logging statement in the middle of a computation may alter the scheduling of registers, […]

Note that most of his reproaches are for C compilers and that the situation—for C compilers—has improved since his article was published: although the article was written after C99, some of the liberties taken by C compilers are clearly disallowed by the C99 standard or are not clearly allowed or are allowed only within a clearly defined framework (look for occurrences of FLT_EVAL_METHOD and FP_CONTRACT in the C99 standard for details).

One way in which the situation has improved for C compilers is that GCC now implements clear, deterministic, standard-compliant semantics for floating-point even when extra precision is present. The relevant option is -fexcess-precision=standard, set by -std=c99.

WRT the question - "How to discard unwanted extra precision from floating point computations?" - the simple answer is "don't". Discarding the precision at one point isn't sufficient as extra precision can be lost or retained at any point in the computation, so apparent reproducibility is a fluke.

Although G++ uses the same back-end as GCC, and although the C++ standard defers to the C standard for the definition of math.h (which provides FLT_EVAL_METHOD), G++ unfortunately does not support -fexcess-precision=standard.

One solution is for you to move all the floating-point computations to C files that you would compile with -fexcess-precision=standard and link to the rest of the application. Another solution is to use -msse2 -mfpmath=sse instead to cause the compiler to emit SSE2 instructions that do not have the “excess precision” problem. The latter two options can be implied by -m64, since SSE2 predates the x86-64 instruction set and the “AMD64 ABI” already uses SSE2 registers for passing floating-point arguments.

Interesting. I note that C++11 references C99 for the C library including `math.h` - I assume similar references to C99 are there for operations like `+` and `*`. Maybe the issue will resolve itself in time. For linking to C, it's a tad awkward because this happens to be template-heavy code - strictly, this bezier handling code mostly doesn't even know it's dealing with floating-point. Which IMO shows why it's even more desirable for C++ floating-point to behave deterministically, as with AFAIK every non-float primitive type. — , Dec 24 '14 at 03:35
Calculations that need to be reproduceable should use precisely-specified semantics at every step; on platforms where reproduceability is not required and use of higher-precision intermediate types could yield better performance (which not only includes x87, but with the right libraries could include many low-end embedded systems without FPUs), it would be better to use higher precision except at specific places where rounding would be required, if one is using a compiler that exposes the highest-precision type to the programmer and has good semantics in that regard. — supercat, Apr 13 '15 at 20:45

score -2 · Answer 2 · answered Dec 23 '14 at 02:01

-2

In your case, as with most cases of floating-point comparison, testing for equality should be done with some tolerance (sometimes called epsilon). If you want to know if a point is on a line, you need to check not only if the distance between them is zero, but also if it is less than some tolerance. This will overcome the "excess precision" problem you're having; you still need to watch out for accumulated errors in complex calculations (which may require a larger tolerance, or careful algorithm design specific to FP math).

There may be platform- and compiler-specific options to eliminate excess FP precision, but reliance on these is not very common, not portable, and perhaps not as efficient as other solutions.

answered Dec 23 '14 at 02:01

John Zwinck

239,568
38
324
436

It makes me wonder how you decide what epsilon to use when writing a very general library. For example, what about the checks for zero in my quadratic solver? In principle, [excess precision could be discarded between the check and the following division](http://stackoverflow.com/q/503436/180247). Apparently [in C99, a cast should discard the excess precision](http://stackoverflow.com/a/503523/180247) but when that answer was written, that wasn't widely implemented. Certainly I tried using a cast before trying `volatile` (MinGW GCC 4.7.0-1, C++98/03 mode) and it didn't help. – Dec 23 '14 at 02:22
Actually, I'm tempted to bypass the issue by switching to nice deterministic fixed-point, especially as performance isn't important to me for this. And maybe it's no fluke that the the Photoshop PSD file format stores the control points for paths in a fixed-point format. – Dec 23 '14 at 02:31
@Steve314: indeed, I should have mentioned the alternative of "just don't use floating point." Other systems like fixed-point may be just the ticket for some applications (see also: currency calculations!). – John Zwinck Dec 23 '14 at 02:32

How to discard unwanted extra precision from floating point computations?

2 Answers2