4

I am dealing with a code which uses very small numbers of order 10^-15 to 10^-25. I tried using double and long double but i get a wrong answer as either 0.000000000000000000001 is rounded off to 0 or a number like 0.00000000000000002 is represented as 0.00000000000000001999999999999.

Since even a small fraction of 1/1000000 makes a huge difference in my final answers, is there an appropriate fix?

#include <iostream>
#include <math.h>
#include <stdlib.h>
#include <iomanip>

using namespace std;

int main()
{
    double  sum, a, b, c,d;
    a=1;
    b=1*pow(10,-15);
    c=2*pow(10,-14);
    d=3*pow(10,-14);
    sum=a+b+c+d;
    cout<<fixed;
    cout<<setprecision(30);
    cout<<" a   : "<<a<<endl<<" b   : "<<b<<endl<<" c   : "<<c<<endl
        <<" d   : "<<d<<endl; 
    cout<<" sum : "<<sum<<endl<<endl;
    a=a/sum;
    b=b/sum;
    c=c/sum;
    d=d/sum;
    sum=a+b+c+d;
    cout<<" a   : "<<a<<endl<<" b   : "<<b<<endl<<" c   : "<<c<<endl
        <<" d   : "<<d<<endl; 
    cout<<" sum2: "<<sum<< endl;
    return 0;
}

The expected output should be:

a   : 1.000000000000000000000000000000
b   : 0.000000000000001000000000000000
c   : 0.000000000000020000000000000000
d   : 0.000000000000030000000000000000
sum : 1.000000000000051000000000000000

a   : 1.000000000000000000000000000000
b   : 0.000000000000001000000000000000
c   : 0.000000000000020000000000000000
d   : 0.000000000000030000000000000000
sum1: 1.000000000000051000000000000000

But, the output I get is:

a   : 1.000000000000000000000000000000
b   : 0.000000000000001000000000000000
c   : 0.000000000000020000000000000000
d   : 0.000000000000029999999999999998
sum : 1.000000000000051100000000000000

a   : 0.999999999999998787999878998887
b   : 0.000000000000000999999997897899
c   : 0.000000000000019999999999999458
d   : 0.000000000000029999999999996589
sum1: 0.999999999999989000000000000000

I tried double, long double and even boost_dec_float, but the output I get are similar.

Holt
  • 36,600
  • 7
  • 92
  • 139
Md_Asher
  • 69
  • 2
  • 3
  • Make a struct that is your own representation of the number; with a short for the exponential, and a double for the rest; Or maybe change the units so that you're not working with such small numbers? – UKMonkey Jan 10 '17 at 10:42
  • 3
    I would argue this is not a duplicate. The OP knows exactly what is happening, and is asking for a solution to increase precision. – SingerOfTheFall Jan 10 '17 at 10:44
  • 1
    So your problem is that `3*pow(10,-14)` is being displayed as 0.000000000000029999999999999998 rather than 0.000000000000030000000000000000. That is pretty much on the limits of the precision of double. – Martin Bonner supports Monica Jan 10 '17 at 10:52
  • @SingerOfTheFall Using IEEE double and then expecting `cout< – n. m. could be an AI Jan 10 '17 at 10:54
  • Dear OP, before complaining, calculate the difference in percentage. I'm not sure you're aware how precise these results are. Could this be an http://xyproblem.info/ ? – Karoly Horvath Jan 10 '17 at 11:02

2 Answers2

3

As you noticed, this happens because the numbers cannot be accurately represented in binary, and are rounded to a certain degree.

Now, since you tagged this with boost tag, boost has boost.multiprecision that does exactly what you want. It offers cpp_dec_float_50 data type which ensures exact computation up to 50 decimal digits. It is used as any other type:

typedef boost::multiprecision::cpp_dec_float_50 value_type;

value_type v1 = 1;
value_type v2 = 3;

value_type v3 = v1 / v2;

According to boost doc, this is guaranteed to output only the precise bits:

cpp_dec_float_50 seventh = cpp_dec_float_50(1) / 7;
cpp_dec_float_50 circumference = boost::math::constants::pi<cpp_dec_float_50>() * 2 * seventh;
std::cout.precision(std::numeric_limits<cpp_dec_float_50>::digits10);
std::cout << circumference << std::endl;
SingerOfTheFall
  • 29,228
  • 8
  • 68
  • 105
0

I bet you are writing:

long double  sum, a, b, c,d;
a=1;
b=1*pow(10,-15);
c=2*pow(10,-14);
d=3*pow(10,-14);

The trouble is that the pow will be the double version of pow - not the long double version. You need to force one of the arguments to long double:

long double  sum, a, b, c,d;
a=1;
b=1*pow(10.0L,-15);
c=2*pow(10.0L,-14);
d=3*pow(10.0L,-14);