7

I have the problem to determine the smallest value eps for a given double variable v such that

v+eps != v

Note that this is not the typical problem sheet task since eps depends on the arbitrary number v.

This should not be done by seeking this value in a for loop. Is there a fast way to do this, e.g. by bit shifting? Independent of the compiler, optimization flags, platform...

Thanks for your answers

urzeit
  • 2,863
  • 20
  • 36
choosyg
  • 774
  • 7
  • 23
  • possible duplicate of [How to find nearest next/previous double value (numeric\_limits::epsilon for given number)](http://stackoverflow.com/questions/10160079/how-to-find-nearest-next-previous-double-value-numeric-limitsepsilon-for-give) – rubenvb Oct 10 '13 at 09:29
  • @BoBTFish and also completely besides the question. – rubenvb Oct 10 '13 at 09:30
  • @rubenvb Yes, I started writing it, then couldn't quite get my head around what I meant about turning that into an epsilon. So I went off to do some reading. I'll just delete it until such time as I work out what I'm trying to say. – BoBTFish Oct 10 '13 at 09:33
  • 1
    The question and the title ask for different things. If d is the minimum change from (a positive) v to the next greater representable value and e is the smallest (positive) value such that calculating v+e does not produce v, then e is half d or very slightly greater, depending on the low bit of the significand of v. This is because, in floating-point arithmetic, a mathematical result just halfway to the next representable value is enough to cause rounding upward. (If it exactly halfway, rounding is down if the low bit of v is even, up if odd, which is why e may be slightly more than half d.) – Eric Postpischil Oct 10 '13 at 10:32
  • You should specify clearly whether you want d or e and change the title and the question text to match. (Note: This assumes IEEE 754 arithmetic in the common round-to-nearest mode.) – Eric Postpischil Oct 10 '13 at 10:34

3 Answers3

3

The C99 function nextafter is what you need. Alternatively, use Boost.Math's nextafter. This is implementation defined by definition (it relies on the internal representation of double in memory).

For a comparison of all methods presented in the answers here at the time of writing, see a live demo to see how the other solutions fail.


For reference, here is the test code if you want to run it on our own system:

#include <cmath>
#include <cfloat>
#include <limits>
#include <iostream>
using std::cout;
#include <iomanip>
using std::setprecision;

#include <boost/math/special_functions/next.hpp>

double
epsFor( double x )
{
  union
  {
    double d;
    unsigned long long i;
  } tmp;
  tmp.d = x;
  ++ tmp.i;
  return tmp.d - x;
}

void test(double d)
{
  double d1 = std::nextafter(d,DBL_MAX);
  double d2 = d+std::numeric_limits<double>::epsilon() * d;
  double d3 = d+epsFor(d);
  double d4 = boost::math::nextafter(d, DBL_MAX);
  cout << setprecision(40)
       << "For value of d = " << d << '\n'
       << " std::nextafter: " << d1 << '\n'
       << " Boost solution: " << d4 << '\n'
       << " undefined beh.: " << d3 << '\n'
       << " numeric_limits: " << d2 << '\n';
}

int main()
{
  test(0.1);
  test(986546357654.354687);
}
rubenvb
  • 74,642
  • 33
  • 187
  • 332
  • why the `*d` in `d+epsFor(d)*d`? – Henrik Oct 10 '13 at 09:59
  • @henrik because Ivan's answer does that. – rubenvb Oct 10 '13 at 10:01
  • well, no, Ivan's answer uses epsilon, not epsFor. – Henrik Oct 10 '13 at 10:04
  • What if the mantissa addition (You are adding +1 to the number, so its added on the mantissa [the lsb zone]) results in an overflow? It modifies the exponent. And so one (The overflow could be propagated to the msb zone, modifying the exponent, the sign after that, etc). – Manu343726 Oct 10 '13 at 10:05
  • @Manu343726 With IEEE floating point, the results will still be correct. (There would be a problem, of course, if the overflow resulted in Inf.) – James Kanze Oct 10 '13 at 10:08
  • @JamesKanze the overflow results in `exponent += 1` and `mantissa := 000000000` right? It should be `exponent += 1` and `mantissa := 0000001`, isn't? – Manu343726 Oct 10 '13 at 10:15
  • Actually, my `espFor` gives exactly the same results as `boost::nextafter`. The "live demo" which is linked to seems to show rather a bug in the compiler (or a misuse of the expressions---C++ is very lax with regards to the precision of the intermediate results). – James Kanze Oct 10 '13 at 10:41
  • Of course, the `boost::nextafter` or `std::nextafter`, if available, is preferable, because they handle all of the border cases (0, underflow, overflow, etc.) correctly. – James Kanze Oct 10 '13 at 10:56
  • @James, no, as Henrik pointed out, I made a mistake, it is now corrected and your code does indeed give the same as the `nextafter` solutions. – rubenvb Oct 10 '13 at 11:45
1

I'd use type punning:

double
epsFor( double x )
{
    union
    {
        double d;
        unsigned long long i;
    } tmp;
    tmp.d = x;
    ++ tmp.i;
    double results = tmp.d - x;
    return results;
}

(Formally, this is undefined behavior, but in practice, I don't know of a modern compiler where it will fail.)

EDIT:

Note that C++ allows excessive precision in intermediate expressions; since we're concerned here with exact results, the originally posted function could give wrong results if you used it directly in an expression, rather than assigning it to a double. I've added an assignment in the function to avoid this, but be aware that a lot of compilers are not standard conform in this regard, at least by default. (g++ is a good example of one where you need a special option to have conformant behavior, at least when optimization is turned on. If you're using g++, you must specify the -ffloat-store option if you want correct results.)

James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • Do you know of any guarantee with respect to the size of `i`? Wouldn't it be safer to use ``uint64_t``? – Jonas Schäfer Oct 10 '13 at 09:41
  • This will return negative values if x is negative. Not sure, if that's what the OP wants. +1 anyway. – Henrik Oct 10 '13 at 09:42
  • 1
    Definitely going to downvote for the undefined behaviour. You can set optimization flags in GCC and Clang that will break this. – Puppy Oct 10 '13 at 09:57
  • @JonasWielicki Any use of this is clearly implementation dependent. I used the appropriate types for the most common platforms (Intel and AMD, Sparc, Power PC, etc.). In this case, I'm not sure `unint64_t` would buy anything: it will cause the compile to fail on a few exotic platforms (Unisys mainframes, for example), which might be an advantage. (Or not. I don't know what `long long` is on those platforms, and it may be that using `uint64_t` would cause compilation to fail although the above would work.) – James Kanze Oct 10 '13 at 10:07
  • @DeadMG G++ has always guaranteed this in the past. You can work around the undefined behavior using `memcpy` if you wish; the "intent" of the standard is that type punning should work with `reinterpret_cast`, but in fact, g++ only guarantees it with the `union` (which is the most widespread way of accomplishing it). And since the above is clearly only valid for IEEE (and similar representations which use an implicit MSB in the mantissa), I'm only worried about what really happens in actual practice. – James Kanze Oct 10 '13 at 10:55
  • `-ffloat-store` does not entirely fix the issue with higher-precision intermediate results, as it causes double-rounding instead. The solution is to embrace higher-precision and make it part of the definition of the language's semantics. This is what the C99 compiler-defined FLT_EVAL_METHOD is for. Recent C++ standards also define FLT_EVAL_METHOD, with the same meaning as in C. – Pascal Cuoq Oct 10 '13 at 13:53
  • @PascalCuoq `FLT_EVAL_METHOD` doesn't really help here, since I can't control it. And none of the standard defined values seem to correspond to what g++ actually does (nor with some options, what VC++ does). – James Kanze Oct 10 '13 at 17:36
  • @JamesKanze Do recent g++ versions accept the `-fexcess-precision=standard` flag? I understand http://gcc.gnu.org/ml/gcc-patches/2008-11/msg00105.html as meaning that recent a g++ should understand it. (Note: I am aware that we are getting sidetracked here slightly, but we are on-topic in the sense that `-fexcess-precision=standard`, if accepted, should make g++ drop excess precision on assignments, something you refer to in your answer) – Pascal Cuoq Oct 10 '13 at 17:43
  • @PascalCuoq I haven't verified. The _documentation_ says that `-fexcess-precions` only applies to C, which would mean not to C++. But it would definitely make sense in C++ as well. – James Kanze Oct 10 '13 at 18:08
  • May I suggest you try the program in section “A ray of hope” in http://blog.frama-c.com/index.php?post/2013/07/24/More-on-FLT_EVAL_METHOD_2 with `g++ -fexcess-precision=standard`? It is impossible in theory to disprove the assertion “the compiler does whatever it pleases with floating-point computations”, but in practice, when GCC or Clang are breaking the semantics suggested by the standard, they break them badly. – Pascal Cuoq Oct 10 '13 at 21:11
  • I get “cc1plus: sorry, unimplemented: -fexcess-precision=standard for C++” with SVN GCC snapshot 172652, the most recent g++ I have around, from 2011-04-18. – Pascal Cuoq Oct 10 '13 at 21:17
0
eps = std::numeric_limits<double>::epsilon() * v;
Ivan Ishchenko
  • 1,468
  • 9
  • 12
  • Not actually the correct answer here, thanks to rounding... v + v*eps*(2/3), for instance. – Sneftel Oct 10 '13 at 09:28
  • That is the difference between `1.0` and the next representable `double`. The gap will be larger for larger `double`s – BoBTFish Oct 10 '13 at 09:31
  • 1
    @BoBTFish : epsilon() is, that's why you should multiply it with v. – Ivan Ishchenko Oct 10 '13 at 09:32
  • 2
    The gap is not linearly increasing. – Puppy Oct 10 '13 at 09:56
  • 1
    @DeadMG Considering that the OP does not know whether he wants ULP(v) or half of it, I find this answer good enough. `v * (1 + epsilon) - v` computes the ULP half the times without using nextafter() or type-punning, but half the times it evaluates to 2ULPs. Perhaps something like `v + (v * epsilon / 2) - v` can be made to work in all cases, but as written I still fear it is wrong on some cases. – Pascal Cuoq Oct 10 '13 at 14:01