First, I'll just say that I know floating point calculations are approximate - subject to rounding errors - so normally you can't test for exact equality and expect it to work. However, floating point calculations are still deterministic - run the same code with the same inputs and you should get the same result. [see edit at end] I've just had a surprise where that didn't work out.
I'm writing a utility to extract some information from Photoshop PSD files. Paths containing cubic Bezier curves are part of that, and I needed to compute axis-aligned bounding boxes for Bezier curves. For the theory, I found A Primer on Bezier Curves via another SO question.
Quick summary of method...
- Determine the derivative of the cubic bezier - a quadratic bezier.
- Use the quadratic formula to solve the quadratic for each dimension, giving the parameter values for the stopping points (including maxima and minima) of the curve.
- Evaluate the cubic bezier positions at those parameters (and the start and end points), expanding the bounding box.
- Because I want the actual on-the-boundary points as well as the bounding box, make a second pass, computing positions and rejecting those points that aren't on the final bounding box.
The fourth step was the problem. Expanding the bounding box shouldn't change floating-point values - it just selects largest/smallest values and stores them. I recompute the same points using the same curve control points and the same parameter, but comparing for exact equality with the bounding box failed where it should pass.
Adding some debug output made it work. Removing the debug code but compiling for debug mode made it work. No possibility of memory corruption.
I figured that the stored value (in the form of the bounding box) was somehow lower precision than the newly recomputed position, and that seems to be correct. When I add debug code or compile in debug mode, some inlining doesn't happen, less optimization happens, and as a result the newly computed values get the same precision-loss that the stored bounding box boundary has.
In the end, I resolved the problem by using a volatile
variable. The newly recomputed value gets written into the volatile
variable, then read back out, thus forcing the compiler to compare a precision-reduced stored version with another precision-reduced stored version. That seems to work.
However, I really have no idea whether that should work - it was just an idea and I know how sensitive things like that can be to compiler-writers interpretation of technicalities in the standard. I don't know precisely what relevant guarantees the standard makes, and I don't know if there's a conventional solution to this issue. I'm not even sure what inspired me to try volatile
which IIRC I've not used before since I was working in pure C on embedded systems around 1996-ish.
Of course I could compute the set of position vectors once and store them ready for both the bounding rectangle and the filtering, but I'm curious about this specific issue. As you may guess, I don't do much floating point work, so I'm not that familiar with the issues.
So - is this use of volatile
correct? Is it, as I suspect, unnecessarily inefficient? Is there an idiomatic way to force extra precision (beyond the limits of the type) to be discarded?
Also, is there a reason why comparing for exact equality doesn't force the precision of the type first? Is comparing for exact equality with the extra precision (presumably unreliably) preserved ever a useful thing to do?
[EDIT - following Hans Passants first comment, I've given some more thought to what I said about inlining above. Clearly, two different calls to the same code can be optimized differently due to different inlining decisions. That's not just at one level - it can happen at any depth of inlining functions into a piece of code. That means the precision reduction could happen pretty much anywhere, which really means that even when the same source code is used with the same inputs it can give different results.
The FPU is deterministic, so presumably any particular target code is deterministic, but two different calls to the same function may not use the same target code for that function. Now I'm feeling a bit of an idiot, not seeing the implications of what I already figured out - oh well. If there's no better answer in the next few days, I'll add one myself.]