How to bound a floating-point arithmetic result?

Question

Floating-point operations like x=a/b are usually not exactly representable so the CPU has to do rounding. Is it possible to get the two floats x_low and x_up that are respectively the highest floating point less or equals than the exact value of a/b and the lowest floating point higher or equals than a/b?

Some of the conditions are :

a, b, x_low, x_up and x are float
a and b are positive, integers (1.0f, 2.0f, etc)

it is representable, though the representation is not necessarily exact — 463035818_is_not_an_ai, Jun 01 '22 at 10:41
Any language where this problem is solvable, but I prefer C++. C and C++ are not that different regarding the arithmetic. — rafoo, Jun 01 '22 at 10:48
By the way if there is something like "it always round down", then I can just take the next floating point but I am not sure if it does that. — rafoo, Jun 01 '22 at 10:50
See [std::nextafter](https://en.cppreference.com/w/cpp/numeric/math/nextafter) — prapin, Jun 01 '22 at 10:51
In C++ you can create your own class, which will keep the low and the high bounds and overload the arithmetic operators on this class. The calculations are not trivial though, as you need to deal with special values, very small values, etc. — Alex Sveshnikov, Jun 01 '22 at 10:51
This problem is regarding only for `float` (not even `double`, if it change anything) — rafoo, Jun 01 '22 at 10:52
" a , b are `float` " & "a and b are integer" ?!? Did you perhaps mean `a` and `b` can be considered to be exact? — 463035818_is_not_an_ai, Jun 01 '22 at 10:56
Yes, they are `float` (the type) and integers not `int`. That's what the markdown is made for, right? I think some floats can be exact but not integers, like 1/2 is exactly representable but not an integer. Inversly, some numbers are integers but not exactly representable, like 10^20. — rafoo, Jun 01 '22 at 10:57
strictly speaking not all integers are exactly representable as `float` (https://stackoverflow.com/a/43656339/4117728), but ok, I get what you mean — 463035818_is_not_an_ai, Jun 01 '22 at 11:03
@463035818_is_not_a_number: Re “it is representable, though the representation is not necessarily exact”: That is not what “representable” means. If a particular floating-point datum represents, for example, 3.125, then it represents 3.125 and not any other number. It does not represent 3.125000000001. This is per the IEEE-754, C++, and C standards: Each floating-point datum represents one specific number (or a NaN). Floating-point operations, not numbers, approximate real arithmetic by producing results equivalent to real-arithmetic results rounded to representable numbers… — Eric Postpischil, Jun 01 '22 at 11:47
… This distinction is crucial to designing, analyzing, proving, and debugging floating-point software. — Eric Postpischil, Jun 01 '22 at 11:47
@rafoo: The C and C++ standards specify some optional features for controlling rounding methods in floating-point operations, but support for them is not great, and changing rounding methods does not perform well on most processors. So, you may be able to write code that sets the rounding method to “toward −∞,” perform an operation to get a lower bound on the result, then change the method to “toward +∞,” and repeat the operation to get an upper bound. You will likely find this cumbersome… — Eric Postpischil, Jun 01 '22 at 11:52
… Further, you then have two bounds. To use this in further arithmetic, you have to manage operands that are not single numbers but that are intervals with lower and upper bounds. This is called interval arithmetic, and it has been around conceptually for many decades at least, but it has not proven to be popular, likely due to its lack of usefulness for its cost. Given two intervals [a, b] and [c, d] and some operation •, finding [a, b]•[c, d] can depend not only on the operation but also on the specific values. For example, for division, if a, b, c, and d are all positive, then… — Eric Postpischil, Jun 01 '22 at 11:54
… the answer is [lower(a/d), upper(b/c)], where “lower” computes with rounding down (toward −∞), and “upper” computes with rounding upward. However, if, say, a, b, and d are positive while c is negative, then the interval [a, b] / [c, d] wraps around infinity; it includes all the numbers from a/d up to +∞ and the numbers from −∞ up to b/c. The sets of potential results can grow to be complicated, and this, as well as growth of the interval sizes as more operations are performed, limits the usefulness of interval arithmetic. — Eric Postpischil, Jun 01 '22 at 11:59
@rafoo Assuming you can use C, have you tried using `fenv.h` and in particular the function `fesetround()` to set the rounding mode for the division operation? This was introduced with ISO-C99, may not work (or may not work reliably) with all compilers, and may result in slow code. In my experience, it works well with the Intel C/C++ compiler, but there is a performance impact due to the overhead of changing rounding mode on x86-architecture processors. — njuffa, Jun 01 '22 at 15:41

score 0 · Answer 1 · answered Jun 01 '22 at 11:49

This will give you a bounds that might be too large:

#include <cmath>
#include <utility>

template<typename T>
std::pair<T, T> bounds(int a, int b) {
    T ta = a, tb = b;
    T ta_prev = std::nexttoward(ta), ta_next = std::nextafter(ta);
    T tb_prev = std::nexttoward(tb), tb_next = std::nextafter(tb);
    return std::make_pair(ta_prev / tb_next, ta_next / tb_prev);
}

score 0 · Answer 2 · answered Jun 01 '22 at 12:49

An easy way to do it is to do the division in higher precision and get the upper/lower bound on conversion to float:

struct float_range {
    float lower;
    float upper;
};

float_range to_float_range(double d) {
    float as_float = static_cast<float>(d);
    double rounded = double{as_float};
    if (std::isnan(as_float) || rounded == d) {
        // No rounding done
        return { as_float, as_float };
    }
    if (rounded < d) {
        // rounded down
        return { as_float, std::nextafter(as_float, std::numeric_limits<float>::infinity()) };
    }
    // rounded up
    return { std::nextafter(as_float, -std::numeric_limits<float>::infinity()), as_float };
}

float_range precise_divide(float a, float b) {
    return to_float_range(double{a}/double{b});
}

How to bound a floating-point arithmetic result?

2 Answers2