C/C++ Arithmetic expression with double numbers - strange results - translating in java

Question

FIRST PROBLEM

In C code I have this expression:

double completeExpression  = x1 - h*exp(-lambda*t);

I have split it in two operations:

double value = h*exp(-lambda*t);
double subtraction = x1 - value;

The problem is that subtraction is different from completeExpression. What's the matter?

I have reproduced a strange results in my code with this lines:

const double TOLERANCE = 1e-16;
double h = 0.51152525298500628;
double lambda =0.99999999999999978;
double t=0.1;
double x1 =0.4628471891711442 ;

double completeExpression  = x1 - h*exp(-lambda*t);
double value = h*exp(-lambda*t);
double subtraction = x1 - value;

printf("x1 = %1.4e & value = %1.4e",x1,value);
printf("\ncompleteExpression = %1.4e",completeExpression);
printf("\nsubtraction = %1.4e",subtraction);

Results:

x1 = 4.6285e-001 & value = 4.6285e-001
completeExpression = 8.2779e-017
subtraction = 5.5511e-017

SECOND PROBLEM:

I have to translate the completeExpression in Java, and I have returned always the bad result (subtraction) and not completeExpression value:

Code:

 static double TOLERANCE = 1e-16;
   public static void main() {

        double h = 0.51152525298500628;
        double lambda =0.99999999999999978;
        double t=0.1;
        double x1 =0.4628471891711442 ;

        double completeExpression  = x1 - h*Math.exp(-lambda*t);
        double value = h*Math.exp(-lambda*t);
        double subtraction = x1 - value;

        System.out.println( "x1 = " + String.format("%1.4e", value) + "& value = " + String.format("%1.4e",x1) );
        System.out.println("\ncompleteExpression = " + String.format("%1.4e",completeExpression));
        System.out.println("\nsubtraction = " + String.format("%1.4e",subtraction));

#gcc --version

My Gcc Version:
$ gcc --version
gcc.exe (GCC) 4.8.1
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Check this: http://stackoverflow.com/questions/31418209/double-multiplication-differs-between-compile-time-and-runtime-in-32-bit-platfor — ouah, Sep 24 '15 at 21:36
Looks like a very common computational margin when dealing with non-integers — YePhIcK, Sep 24 '15 at 21:39
Floating point calculations are inexact; consequently, relying on exact results (e.g. testing of two numbers are inequal) is a bad idea. If you increase the precision in your `printf`s, you should be able to see the difference there too. — , Sep 24 '15 at 21:42
I [can't reproduce it](http://coliru.stacked-crooked.com/a/9eeafaede88116fb), which compiler are you using? — alain, Sep 24 '15 at 21:45
Try representing the mathematically exact real value of `1 / 7` with only 16 fractional decimal digits. — too honest for this site, Sep 24 '15 at 21:49
After changing the check to `Math.abs(...) > TOLERANCE`, it [works with java too](http://ideone.com/UNKsIJ) — alain, Sep 24 '15 at 22:04
@alain I don't understand why in C code completeExpression = 8.2779e-017 and subtraction = 5.5511e-017 — michele, Sep 24 '15 at 22:08
But how do you get these numbers? Both the C and Java examples produce the exact same numbers. — alain, Sep 24 '15 at 22:10
@alain With my notebook and gcc I have these 2 different result 8.2779e-017 & 5.5511e-017. If I code it in java I have always 5.5511e-017 — michele, Sep 24 '15 at 22:13
Ok, that's really strange. However: 5.5511e-17 is the correct result, and since you are translating it to Java, and Java produces the correct result, it should be no problem ;-) — alain, Sep 24 '15 at 22:19

YePhIcK · Answer 1 · 2015-09-24T21:58:33.953

1

Floating-point numbers (unlike integers) are almost never exactly the same. The reason is in the way they are stored (with the mantissa and the exponent).

In the end you can never be sure that the two floating-point numbers are the same after performing the "same" operations on them. And more to the point - the if(subtraction!=completeExpression) is generally invalid. Instead you should be looking for a "close match":

if( abs(subtraction - completeExpression) < TOLERANCE )

where TOLERANCE is some constant you have, like const double TOLERANCE = 1e-16;

For more information on why are the floating-point numbers "approximate" you can read the Wiki on Floating point. But the basic reason is that the range of numbers represented by the floating-point values is far larger than the number of digits that can be encoded into a given space.

A 32-bit integer can encode values from -2GB to +2GB but a 32-bit float's range is all the way from -3.4e38 to +3.4e38. That is a range difference of over 20 digits!

For a 64-bit values the range difference is even bigger and is almost 300 digits.

That extended range comes at a price - a portion of that 32 or 64 bit space is used to represent the "precision" digits (in binary, not in decimal) and the number of those is what is limiting the final precision of your floating-point numbers.

Generally speaking two numbers 123e456 and 1.23e458 (when represented in floating-point IEEE 754 binary format) are still going to be different, even though mathematically they are absolutely equal.

edited Sep 24 '15 at 21:58

answered Sep 24 '15 at 21:48

YePhIcK

5,816
2
27
52

The issue is not the number of decimal places that can by the combination of the mantissa and exponent, but the the number that can be represent by the mantissa alone. The numbers in the OP's original post (except 0.1) all looks as though they be inexactly represented. – marko Sep 24 '15 at 21:54
I'll re-write my answer to make this point more clear (I agree - the mantissa is what limits the precision) – YePhIcK Sep 24 '15 at 21:55
@YePhlcK thanks for the suggestion, I edited the code. Unfortunately the problem is always true – michele Sep 24 '15 at 21:56
@michele - try to play with the `TOLERANCE`. Most likely you need to increase it (I don't know the exact value that is "good enough" for your particular application) – YePhIcK Sep 24 '15 at 22:00
@YePhlcK these value is used for other expression and the final result is that for particular input combination, the results are more different. – michele Sep 24 '15 at 22:02
also... in your particular code the correct statement would be `if( Math.abs(subtraction - completeExpression) > TOLERANCE )` since you are looking for "false" comparison. – YePhIcK Sep 24 '15 at 22:02
@YePhIcK But the principal problem is that I have returned in C code completeExpression = 8.2779e-017 and subtraction = 5.5511e-017. This different values take me different results for the final elaborations. – michele Sep 24 '15 at 22:10
@michele - this is a great point and the answer to it is to *be careful when doing floating-point math* as the result **may** depend on the sequence of your computations – YePhIcK Sep 24 '15 at 22:16

C/C++ Arithmetic expression with double numbers - strange results - translating in java

1 Answers1