epsilon for various float values

Question

There is FLT_MIN constant that is nearest to zero. How to get nearest to some number value?

As an example:

float nearest_to_1000 = 1000.0f + epsilon;
// epsilon must be the smallest value satisfying condition:
// nearest_to_1000 > 1000.0f

I would prefer numeric formula without using special functions.

In IEEE754, reinterpret the `float` as `uint32_t`, increment by one and reinterpret back (modulo endianness). — Kerrek SB, Aug 09 '12 at 12:55
Also see [this question](http://stackoverflow.com/q/7408407/596781). — Kerrek SB, Aug 09 '12 at 13:11
By the way, FLT_MIN is not the float nearest zero. It is the smallest normal float. The denormals are smaller. With IEEE 754, FLT_EPSILON * FLT_MIN is the smallest positive float. — Eric Postpischil, Aug 09 '12 at 17:03

Eric Postpischil · Accepted Answer · 2021-03-19T18:12:52.357

C provides a function for this, in the <math.h> header. nextafterf(x, INFINITY) is the next representable value after x, in the direction toward INFINITY.

However, if you'd prefer to do it yourself:

The following returns the epsilon you seek, for single precision (float), assuming IEEE 754. See notes at the bottom about using library routines.

#include <float.h>
#include <math.h>


/*  Return the ULP of q.

    This was inspired by Algorithm 3.5 in Siegfried M. Rump, Takeshi Ogita, and
    Shin'ichi Oishi, "Accurate Floating-Point Summation", _Technical Report
    05.12_, Faculty for Information and Communication Sciences, Hamburg
    University of Technology, November 13, 2005.
*/
float ULP(float q)
{
    // SmallestPositive is the smallest positive floating-point number.
    static const float SmallestPositive = FLT_EPSILON * FLT_MIN;

    /*  Scale is .75 ULP, so multiplying it by any significand in [1, 2) yields
        something in [.75 ULP, 1.5 ULP) (even with rounding).
    */
    static const float Scale = 0.75f * FLT_EPSILON;

    q = fabsf(q);

    /*  In fmaf(q, -Scale, q), we subtract q*Scale from q, and q*Scale is
        something more than .5 ULP but less than 1.5 ULP.  That must produce q
        - 1 ULP.  Then we subtract that from q, so we get 1 ULP.

        The significand 1 is of particular interest.  We subtract .75 ULP from
        q, which is midway between the greatest two floating-point numbers less
        than q.  Since we round to even, the lesser one is selected, which is
        less than q by 1 ULP of q, although 2 ULP of itself.
    */
    return fmaxf(SmallestPositive, q - fmaf(q, -Scale, q));
}

The following returns the next value representable in float after the value it is passed (treating −0 and +0 as the same).

#include <float.h>
#include <math.h>


/*  Return the next floating-point value after the finite value q.

    This was inspired by Algorithm 3.5 in Siegfried M. Rump, Takeshi Ogita, and
    Shin'ichi Oishi, "Accurate Floating-Point Summation", _Technical Report
    05.12_, Faculty for Information and Communication Sciences, Hamburg
    University of Technology, November 13, 2005.
*/
float NextAfterf(float q)
{
    /*  Scale is .625 ULP, so multiplying it by any significand in [1, 2)
        yields something in [.625 ULP, 1.25 ULP].
    */
    static const float Scale = 0.625f * FLT_EPSILON;

    /*  Either of the following may be used, according to preference and
        performance characteristics.  In either case, use a fused multiply-add
        (fmaf) to add to q a number that is in [.625 ULP, 1.25 ULP].  When this
        is rounded to the floating-point format, it must produce the next
        number after q.
    */
#if 0
    // SmallestPositive is the smallest positive floating-point number.
    static const float SmallestPositive = FLT_EPSILON * FLT_MIN;

    if (fabsf(q) < 2*FLT_MIN)
        return q + SmallestPositive;

    return fmaf(fabsf(q), Scale, q);
#else
    return fmaf(fmaxf(fabsf(q), FLT_MIN), Scale, q);
#endif
}

Library routines are used, but fmaxf (maximum of its arguments) and fabsf (absolute value) are easily replaced. fmaf should compile to a hardware instruction on architectures with fused multiply-add. Failing that, fmaf(a, b, c) in this use can be replaced by (double) a * b + c. (IEEE-754 binary64 has sufficient range and precision to replaced fmaf. Other choices for double might not.)

Another alternative to the fused-multiply add would be to add some tests for cases where q * Scale would be subnormal and handle those separately. For other cases, the multiplication and addition can be performed separately with ordinary * and + operators.

The significand of q has some value between 1 and 2 (excluding 2). If the significand were exactly 1, then q*FLT_EPSILON would be exactly one ULP (the value of the least signficant bit in the significand of q, given its exponent), so q+q*FLT_EPSILON would be exactly the next representable value. However, suppose the significand is closer to 2. Then q*FLT_EPSILON is nearly 2 ULP, and q+q*FLT_EPSILON is very near the second-next representable value, instead of the next, and rounding would make the final result that second-next value. But... — Eric Postpischil, Aug 09 '12 at 13:52
q*.625*FLT_EPSILON lies between .625 ULP (when q’s significand is near 1) and 1.25 ULP (when q’s significand is near 2). So q+q*.625*FLT_EPSILON is always nearer the next representable value (q + 1 ULP) than it is to q or to q + 2 ULP. So rounding makes the result exactly q + 1 ULP, which is what we want. — Eric Postpischil, Aug 09 '12 at 13:53
Another subtlety is when q is negative and exactly a power of 2. Then the next representable number in the direction of INFINITY is not the normal q + 1 ULP but is q + 1/2 ULP, because the next representable number has a lower exponent, so the bits in its significand have half the value they do compared to the same bits in q’s significand. In this case, fabs(q)*.625*FLT_EPSILON is .625 ULP, so q + fabs(q)*.625*FLT_EPSILON is near q + 1/2 ULP, which is a representable number and is the number we want. — Eric Postpischil, Aug 09 '12 at 13:57
The .75 in the first routine is because that routine only needs to return the ULP; it does not need to deal with the stepping-between-powers-of-two issue with negative q. So its range of .75 to 1.5 is fine. But that would round incorrectly for the NextAfter routine, because q+fabs(q)*.75*FLT_EPSILON is q + .75 ULP, which is equally near the two representable numbers q + .5 ULP and q + 1 ULP, and the IEEE 754 rounding rules pick q + 1 ULP (because its low bit is even). So NextAfter uses .625 to ensure q + .5 ULP is nearer. — Eric Postpischil, Aug 09 '12 at 13:59
Take a look at my edit of your answer. The header you tried to name wasn't shown because it was seen as an HTML tag and stripped out. — Nic, Apr 19 '17 at 17:23

epsilon for various float values

1 Answers1

Linked

Related