How to round a float to epsilon overflow-correctly

Question

For the purpose of ordering floating-point numbers in a data structure which doesn't allow duplicates (like a binary tree, heap, etc.), I want to round float to a certain level of precision, as follows:

float round_to_epsilon(x);

bool less_than(float x, float y) {
    return round_to_epsilon(x) < round_to_epsilon(y);
}

In other words, I want to quantize floats to multiples of std::numeric_limits<float>::epsilon, which is 2^-23.

1st Idea: Multiply and Floor

At first glance, it may seem like you could just do:

float round_to_epsilon(float x) {
    // note: floor or ceil must be used to avoid bias around zero
    return std::floor(x / std::numeric_limits<float>::epsilon());
}

However, this will turn every number greater than 2¹⁰⁵ into positive infinity, which is not overflow-correct.

2nd Idea: Divide to Discard Precision

An alternative is to multiply by a really small number to intentionally cause underflow, and drop some digits. However, this makes denormalized numbers quite likely, so there may be serious performance issues with such an approach.

Question

How can I round to epsilon; or in other words, how can I drop digits below a magnitude of 2^-23, without causing unnecessary overflow or underflow?

My intuition is that perhaps I could divide by a power of two and then multiply again to obtain avoid denormalized numbers, but I'm unsure how to do that, or whether it is a good approach.

Note: if `abs(x) >= 1.0f` then the result should be equal to the input. Numbers that don't hit this early return have no risk of overflow. — Ben Voigt, Aug 22 '23 at 19:59
Also note that `ldexp` is nice for multiplying and dividing by powers of two. — Ben Voigt, Aug 22 '23 at 20:00
Knuth's floating point comparisons might be useful: https://stackoverflow.com/a/253874/4641116 — Eljay, Aug 22 '23 at 20:12
In your related answer, you had a version using `modf`. I think you actually want to use `std::fmod` — Ben Voigt, Aug 22 '23 at 20:18
Quantizing for sorting is a bad idea. The requested value can be computed with `x - std::remainder(x, std::numeric_limits::epsilon)` (or variant depending on desired rounding), but you should generally not attempt to quantize floating-point values or otherwise “correct” for errors. The values computed by floating-point arithmetic are as good as floating-point arithmetic can provide, and attempting to “correct” errors without additional information about what is correct makes them worse (albeit sometimes only on average, such as improving many values slightly but making some much worse). — Eric Postpischil, Aug 22 '23 at 20:22
@EricPostpischil: https://stackoverflow.com/a/76954284/103167 (and various past revisions thereof). That will provide the additional context -- this wasn't sorting for the sake of sorting, but for keying a collection. — Ben Voigt, Aug 22 '23 at 20:25
why round for sorting in the first place? You will get `less_than(a,b) == false` and `less_than(b,a) == false` for certain `a` and `b` with `a — 463035818_is_not_an_ai, Aug 22 '23 at 20:26
@EricPostpischil strictly speaking, the purpose is not for sorting a list of numbers, but keeping something like a `std::set` or a heap with some error tolerance, where equivalence of elements in the data structure is established through the less-than operator. — Jan Schultke, Aug 22 '23 at 20:27
ordering isnt the issue, equivalence is. In the question you only mention ordering.... nevermind, last edit fixed it — 463035818_is_not_an_ai, Aug 22 '23 at 20:29
@JanSchultke: Quantizing does not serve for establishing equivalence. Quantization is a discontinuous function. It partitions the floating-point numbers into segments, and there are numbers that are adjacent in the floating-point format that are separated by quantization (one at the end of one segment and another at the start of the next). Unless the set of numbers has no floating-point values that lie in these areas or has other special properties involving them, basing equivalence on quantization will fail. — Eric Postpischil, Aug 22 '23 at 21:13

chux - Reinstate Monica · Answer 1 · 2023-08-22T21:37:57.653

... to quantize floats to multiples of std::numeric_limits<float>::epsilon, which is 2^-23.

Although I do not think this is a good idea as OP's goal appears to be a solution to an xy problem, it can be done with:

float round_to_epsilon_muliple(float f) {
  // Rounding only needed for small values as larger ones 
  // are already a multiple of epsilon.
  if (fabs(f) < 1.0f) {
    static const float one_over_epsilon =
        1.0f/std::numeric_limits<float>::epsilon();
    f = round(f * one_over_epsilon); // or floor(), trunc(), etc.
    f /= one_over_epsilon;
  }
  return f;
}

@Eric Postpischil offers a simplified approach:

float round_to_epsilon_muliple_EP(float f) {
  if (isfinite(f)) {
    return f - std::remainder(f, std::numeric_limits<float>::epsilon());
  }
  return f;
}

Rounding logarithmically distributed float to a linear multiple of epsilon seems strange as it takes the float out of floating-point.

Perhaps OP wants to reduce the precision of a float?

Consider then FP frexp(FP f, int *p) to extract the fraction/power and manipulate those.


float reduce_precision(float f) {
  if (isfinite(f)) {
    int p;
    f = float(f, &p);
    // Let us say we want to lop off 4 of the least significant bits.
    static const unsigned long lop = 1LU << 4;
    static const float one_over_epsilon =
        1.0f/std::numeric_limits<float>::epsilon();
    static const float one_over_scale = one_over_epsilon / lop;
    f = trunc(f * one_over_scale); // or round(), ...
    f /= one_over_scale;
    f = ldexp(f, p);
  }
  return f;
}

Why not `return x - std::remainder(x, std::numeric_limits::epsilon();`? — Eric Postpischil, Aug 22 '23 at 21:17
@EricPostpischil Looks good too (with a test to avoid infinity - infinity) and disturbing `x` as a NAN. — chux - Reinstate Monica, Aug 22 '23 at 21:31

njuffa · Answer 2 · 2023-08-22T21:13:39.873

In other words, I want to quantize floats to multiples of std::numeric_limits<float>::epsilon

I will answer the question as asked, without going into discussion as to whether this is the most suitable approach for the use case stated. Your first idea is actually workable, provided one takes care of the ends of the input domain.

Numbers sufficiently large in magnitude are naturally quantized to integral multiples of eps. Numbers sufficiently smaller than eps will round to zero. For rounding to nearest-or-even during quantization one can use rint(), provided the current rounding mode is set to round-to-nearest-or-even (this is the default in most environments). This caveat is necessary because as opposed to floor(), ceil(), trunc(), and round(), rint() does not operate with a fixed rounding mode, but uses the rounding mode currently in effect.

float round_to_epsilon (float x)
{
    float eps, cutoff, r;
    eps = std::numeric_limits<float>::epsilon();
    cutoff = 0.5f * eps;
    if (fabsf (x) < cutoff) {
        r = copysignf (0.0f, x);
    } else if (fabsf (x) >= 1.0f) {
        r = x;
    } else {
        r = eps * rintf (x / eps);
    }
    return r;
}

Why do anything fancy instead of `return x - std::remainder(x, std::numeric_limits::epsilon();`? — Eric Postpischil, Aug 22 '23 at 21:15

How to round a float to epsilon overflow-correctly

1st Idea: Multiply and Floor

2nd Idea: Divide to Discard Precision

Question

2 Answers2