1

In other words, I want to have a function as follows.

Input: a positive number a representable by IEEE 754

Output: the maximal number e such that a + e == a holds with IEEE 754 arithmetics (e is also representable by IEEE 754)

  • std::numeric_limits<double>::epsilon gives the result with a being 1.
  • Such a functionality can be achieved by naïve bisection. Is there a method costing much fewer CPU cycles? Methods other than bit magic are also welcome, as long as they are fast.
NathanOliver
  • 171,901
  • 28
  • 288
  • 402
Eli4ph
  • 317
  • 2
  • 15
  • 1
    The std::next_after(a) - a would give epsilon but not what OP asks something that is almost epsilon but no cigar. – Öö Tiib Mar 14 '23 at 15:41
  • @ÖöTiib: Yes, to get the number he's describing, you'd do `std::nexttoward(std;:nextafter(a)-a, 0.0);`. This is pretty much just a roundabout way of computing the same thing described in my answer. For what it's worth, these functions do *not* have any underscores in their names. – Jerry Coffin Mar 14 '23 at 15:44
  • There is no “bit magic.” However, code for finding the ULP using floating-point operations is in [this answer](https://stackoverflow.com/a/53599277/298225). That is in C but easily adaptable to C++. You can also use the standard `frexp` and `ldexp` functions to get and manipulate the significands and exponents of floating-point representations. – Eric Postpischil Mar 14 '23 at 15:53
  • @EricPostpischil I think OP is asking for either of `nexttoward(ULP, 0.0)`, `ULP*0.5`, `nexttoward(ULP*0.5, 0.0)` or `0.0` -- depending on the current rounding mode and whether the last bit of `a` is set. – chtz Mar 14 '23 at 17:24
  • Pseudo-code which might work with "standard" ROUND_HALF_EVEN rounding mode (`x.as()` shall mean `bitcast(x)`): `ULP = (x.as() + 1).as() - x; e = ((ULP*0.5).as() - (x.as() & 1)).as();` (will not work for subnormals, and of course untested ...) – chtz Mar 14 '23 at 17:34
  • 1
    @chtz: Their title asks for the “machine epsilon” for arbitrary numbers. Their text says `std::numeric_limits::epsilon` gives the desired result for 1. Both of those point to the ULP. Other text asks for the maximum `e` such that `a+e == a`. That is two out of three indications that point to ULP and one that does not, and the one that does not is similar to a common mistaken description of ULP. – Eric Postpischil Mar 14 '23 at 17:42
  • @EricPostpischil I agree, the question is not well-asked ... – chtz Mar 14 '23 at 17:50

2 Answers2

2

This code computes the ULP of a number, assuming an IEEE-754 binary format:

typedef double FloatType;


/*  Return the ULP of q.

    This was inspired by Algorithm 3.5 in Siegfried M. Rump, Takeshi Ogita, and
    Shin'ichi Oishi, "Accurate Floating-Point Summation", _Technical Report
    05.12_, Faculty for Information and Communication Sciences, Hamburg
    University of Technology, November 13, 2005.

    Since this uses subnormals, it may have poor performance.
*/
FloatType ULP(FloatType q)
{
    static const FloatType Epsilon = std::numeric_limits<FloatType>::epsilon();
    static const FloatType Minimum = std::numeric_limits<FloatType>::denorm_min();
        
    /*  Scale is .75 ULP, so multiplying it by any significand in [1, 2) yields
        something in [.75 ULP, 1.5 ULP) (even with rounding).
    */
    static const FloatType Scale = 0.75 * Epsilon;

    q = std::fabs(q);

    /*  In fmaf(q, -Scale, q), we subtract q*Scale from q, and q*Scale is
        something more than .5 ULP but less than 1.5 ULP.  That must produce q
        - 1 ULP.  Then we subtract that from q, so we get 1 ULP.

        The significand 1 is of particular interest.  We subtract .75 ULP from
        q, which is midway between the greatest two floating-point numbers less
        than q.  Since we round to even, the lesser one is selected, which is
        less than q by 1 ULP of q, although 2 ULP of itself.
    */
    return std::fmax(Minimum, q - std::fma(q, -Scale, q));
}
Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
2

OP's goals have some conflicts about what epsilon means.

Let's go with the C++ spec.

Machine epsilon: the difference between 1 and the least value greater than 1 that is representable.

... and assume OP wants that value for other than 1.0, some double x.


This is simply

#include <cmath>

double epsilon_per_x(double x) {
  x = fabs(x);
  return nextafter(x, INFINITY) - x;
}

We may want something different for the max value rather than returning infinity.

#include <cmath>

double epsilon_per_x(double x) {
  x = fabs(x);
  if (x >= std::numeric_limits<double>::max()) {
    return (nextafter(x/2, INFINITY) - x/2)*2;  
  } else {
    return nextafter(x, INFINITY) - x;
  }
}
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256