Accurately creating a floating point value in the range [a,b)

Question

Consider

auto x = a + (b-a)*v;

Which is meant to create a value in the range [a,b) by the factor v in [0,1.0). From a purely mathematical point of view, x>=a, and x<b. But how do we prove or ensure that this holds for floating point? a, b, v are non-negative and finite floating point values of the same type (double or float) and b>a (originally said b>=a which is obviously incompatible with my requirements on x), and v<=netxtafter(1.0,0) (that is, it's just below 1.0).

It seems obvious, that b-a >0, and therefore (b-a)*v >=0, so that we don't need to check:

if (x<a) return a;

But is this also redundant?

if (x>=b) return std::nextafter(b,a);

May the compiler (optimization) rewrite the expressions to influence these questions? Does the type of floating point representation enter? (I'm mostly interested in the most common (iec559 / IEEE 754).

I don't understand the concern.. if your equality holds for any (positive) Real, it also holds for any floating point numbers since Q (rationals) is a subset of R (reals) and floating point numbers are a subset of Q ... — Cedric Druck, Feb 23 '17 at 09:03
I guess the worst that could happen is when your numbers e.g. either v, b or a (or all of them) are extremely small or close to the boundaries, in which case truncation might be an issue .. however truncation is always towards lower value and all your numbers are positive ... — Cedric Druck, Feb 23 '17 at 09:05
@CedricDruck, Put it this way: If there is no concern it should be easy to prove. Care to write an answer? — Johan Lundberg, Feb 23 '17 at 09:08
http://stackoverflow.com/questions/2909427/c-floating-point-precision#2909445 — Cedric Druck, Feb 23 '17 at 10:53
your calculations will be correct with the set precision .. unless you work for Nasa this should be no concern — Cedric Druck, Feb 23 '17 at 10:53
@CedricDruck I never worked for Nasa but in astro particle physics experiments, and medical physics. Correct code is important in many areas. This has nothing to do with the link you posted or the comment on "set precision", as it's not about displayed values, but the guaranteed range. — Johan Lundberg, Feb 23 '17 at 12:16
Ok that does make sense! Can you specify the expected ranges for a, b (you already said v is in [0, 1) )? — Cedric Druck, Feb 23 '17 at 12:47
Btw the code says x is a simple linear projection between a and b, isn't there a validated code that exists in a math library that does that and covers for all possible values? — Cedric Druck, Feb 23 '17 at 12:48
I suggest we remove all our comments and wait for a proper answer. ... This is not a chat forum. — Johan Lundberg, Feb 23 '17 at 13:40
Woo. I just posted a question to SO in case someone who knows the answer answers. Feel free to not care about the question. — Johan Lundberg, Feb 23 '17 at 18:04

Pascal Cuoq · Accepted Answer · 2017-02-26T20:28:57.687

It seems obvious, that b-a >0, and therefore (b-a)*v >=0, so that we don't need to check: if (x<a) return a;

The property b - a > 0 is correct in IEEE 754 but I wouldn't say that it is obvious. At the time of floating-point standardization, Kahan fought for this property resulting from “gradual underflow” to be true. Other proposals did not have subnormal numbers and did not make it true. You could have b > a and b - a == 0 in these other proposals, for instance by taking a the smallest positive number and b its successor.

Even without gradual underflow, on a system that mis-implements IEEE 754 by flushing subnormals to zero, b > a implies b - a >= 0, so that there is no need to worry about x being below a.

But is this also redundant? if (x>=b) return std::nextafter(b,a);

This test is not redundant even in IEEE 754. For an example take b to be the successor of a. For all values of v above 0.5, in the default round-to-nearest mode, the result of a + (b-a)*v is b, which you were trying to avoid.

My examples are built with unusual values for a and b because that saves me from writing a program to find counter-examples through brute-force, but do not assume that other, more likely, pairs of values for a and b do not exhibit the problem. If I was looking for additional counter-examples, I would in particular look for pairs of values for which the floating-point subtraction b - a rounds up.

EDIT: Oh alright here is another counter-example:

Take a to be the successor of the successor of -1.0 (that is, in double-precision, using C99's hexadecimal notation, -0x1.ffffffffffffep-1) and b to be 3.0. Then b - a rounds up to 4.0, and taking v to be the predecessor of 1.0, a + (b - a) * v rounds up to 3.0.

The floating-point subtraction b - a rounding up is not necessary for some values of a and b making a counter-example, as shown here: taking a as the successor of 1.0 and b as 2.0 also works.

Accurately creating a floating point value in the range [a,b)

1 Answers1