Is there a bit magic to calculate machine epsilon for IEEE 754 floating-point numbers other than 1?

Question

In other words, I want to have a function as follows.

Input: a positive number a representable by IEEE 754

Output: the maximal number e such that a + e == a holds with IEEE 754 arithmetics (e is also representable by IEEE 754)

std::numeric_limits<double>::epsilon gives the result with a being 1.
Such a functionality can be achieved by naïve bisection. Is there a method costing much fewer CPU cycles? Methods other than bit magic are also welcome, as long as they are fast.

The std::next_after(a) - a would give epsilon but not what OP asks something that is almost epsilon but no cigar. — Öö Tiib, Mar 14 '23 at 15:41
@ÖöTiib: Yes, to get the number he's describing, you'd do `std::nexttoward(std;:nextafter(a)-a, 0.0);`. This is pretty much just a roundabout way of computing the same thing described in my answer. For what it's worth, these functions do *not* have any underscores in their names. — Jerry Coffin, Mar 14 '23 at 15:44
There is no “bit magic.” However, code for finding the ULP using floating-point operations is in [this answer](https://stackoverflow.com/a/53599277/298225). That is in C but easily adaptable to C++. You can also use the standard `frexp` and `ldexp` functions to get and manipulate the significands and exponents of floating-point representations. — Eric Postpischil, Mar 14 '23 at 15:53
@EricPostpischil I think OP is asking for either of `nexttoward(ULP, 0.0)`, `ULP*0.5`, `nexttoward(ULP*0.5, 0.0)` or `0.0` -- depending on the current rounding mode and whether the last bit of `a` is set. — chtz, Mar 14 '23 at 17:24
Pseudo-code which might work with "standard" ROUND_HALF_EVEN rounding mode (`x.as()` shall mean `bitcast(x)`): `ULP = (x.as() + 1).as() - x; e = ((ULP*0.5).as() - (x.as() & 1)).as();` (will not work for subnormals, and of course untested ...) — chtz, Mar 14 '23 at 17:34
@chtz: Their title asks for the “machine epsilon” for arbitrary numbers. Their text says `std::numeric_limits::epsilon` gives the desired result for 1. Both of those point to the ULP. Other text asks for the maximum `e` such that `a+e == a`. That is two out of three indications that point to ULP and one that does not, and the one that does not is similar to a common mistaken description of ULP. — Eric Postpischil, Mar 14 '23 at 17:42
@EricPostpischil I agree, the question is not well-asked ... — chtz, Mar 14 '23 at 17:50

Eric Postpischil · Answer 1 · 2023-03-14T16:48:23.223

This code computes the ULP of a number, assuming an IEEE-754 binary format:

typedef double FloatType;


/*  Return the ULP of q.

    This was inspired by Algorithm 3.5 in Siegfried M. Rump, Takeshi Ogita, and
    Shin'ichi Oishi, "Accurate Floating-Point Summation", _Technical Report
    05.12_, Faculty for Information and Communication Sciences, Hamburg
    University of Technology, November 13, 2005.

    Since this uses subnormals, it may have poor performance.
*/
FloatType ULP(FloatType q)
{
    static const FloatType Epsilon = std::numeric_limits<FloatType>::epsilon();
    static const FloatType Minimum = std::numeric_limits<FloatType>::denorm_min();
        
    /*  Scale is .75 ULP, so multiplying it by any significand in [1, 2) yields
        something in [.75 ULP, 1.5 ULP) (even with rounding).
    */
    static const FloatType Scale = 0.75 * Epsilon;

    q = std::fabs(q);

    /*  In fmaf(q, -Scale, q), we subtract q*Scale from q, and q*Scale is
        something more than .5 ULP but less than 1.5 ULP.  That must produce q
        - 1 ULP.  Then we subtract that from q, so we get 1 ULP.

        The significand 1 is of particular interest.  We subtract .75 ULP from
        q, which is midway between the greatest two floating-point numbers less
        than q.  Since we round to even, the lesser one is selected, which is
        less than q by 1 ULP of q, although 2 ULP of itself.
    */
    return std::fmax(Minimum, q - std::fma(q, -Scale, q));
}

chux - Reinstate Monica · Answer 2 · 2023-03-15T04:05:30.100

OP's goals have some conflicts about what epsilon means.

Let's go with the C++ spec.

Machine epsilon: the difference between 1 and the least value greater than 1 that is representable.

... and assume OP wants that value for other than 1.0, some double x.

This is simply

#include <cmath>

double epsilon_per_x(double x) {
  x = fabs(x);
  return nextafter(x, INFINITY) - x;
}

We may want something different for the max value rather than returning infinity.

#include <cmath>

double epsilon_per_x(double x) {
  x = fabs(x);
  if (x >= std::numeric_limits<double>::max()) {
    return (nextafter(x/2, INFINITY) - x/2)*2;  
  } else {
    return nextafter(x, INFINITY) - x;
  }
}

Is there a bit magic to calculate machine epsilon for IEEE 754 floating-point numbers other than 1?

2 Answers2