0

I have 2 doubles and I want to add them, divide them etc but everything returns inf

double num1 = 1.999999999999999999e+320 

double num_2 =1.999999999999999e+320 

Are they out of range of double? How can I extend it or solve the problem?

Me NotMe
  • 1
  • 1
  • 1

5 Answers5

2

Doubles (double precision IEEE754) will only get you up to about 10+/-308 (from memory).

If you have an implementation that supports a wider long double type, you can use that. Now keep in mind that C99 implementations are allowed to treat long double as identical to double so this may not necessarily help you. From C99:

The C floating types match the IEC 60559 formats as follows:
- The float type matches the IEC 60559 single format.
- The double type matches the IEC 60559 double format.
- The long double type matches an IEC 60559 extended format, else a non-IEC 60559 extended format, else the IEC 60559 double format.

Any non-IEC 60559 extended format used for the long double type shall have more precision than IEC 60559 double and at least the range of IEC 60559 double.

'Extended' is IEC 60559’s double-extended data format. Extended refers to both the common 80-bit and quadruple 128-bit IEC 60559 formats.

A non-IEC 60559 long double type is required to provide infinity and NaNs, as its values include all double values.

But, if it uses the extended formats (e.g., 80 or 128-bit formats), that will give you a massive increase in range from the 64-bit double. The IEEE754 binary128 format will give you about 34 decimal digits of precision (up from the 15 you get from double) and a range of about 10+/-4932 (up from 10+/-308).

If it doesn't, or if that's still not enough range or precision, you can look into one of the arbitrary-precision libraries, like MPIR which, despite it's name, is perfectly capable of handling real floating point numbers (not just integers and rationals).

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
1

Use arbitrary precision mathematics library. Have a look at the Arbitrary Precision Arithmetic for links to a number of them.

Ahmed Masud
  • 21,655
  • 3
  • 33
  • 58
  • 1
    I would go this route only after checking out `long double`. They're useful if you _need_ the precision or a range beyond what you have but, despite being pretty fast, they're still software and no match for hardware if your `long double` is done at that level (as quite a few 80-bit ones are, though less of the 128-bit ones - though this may have changed in the few years since I last looked into this). – paxdiablo Nov 03 '11 at 05:41
1

The long double data type does indeed have a greater range. For example, on my machine (64-bit linux), I get the following information:

Maximum value for double: 1.79769e+308
Maximum value for long double: 1.18973e+4932

Notice the larger exponent.

This information was found using the limits library in the C++ STL. An example can be found here.

RobertR
  • 685
  • 5
  • 15
  • 3
    It does _not_ "indeed" have a greater range, and anecdotal evidence is not proof :-) It _may_ have on certain implementations but that's not mandated in any way by the standard. – paxdiablo Nov 03 '11 at 05:25
0

Have you tried long double or float? why would you need such a long number anyway

Ron Tan
  • 63
  • 1
  • 1
  • 6
  • Single precision floats won't help, they only have a +/- 10^38 range. – paxdiablo Nov 03 '11 at 05:12
  • i think long double is the same as double – Me NotMe Nov 03 '11 at 05:18
  • @MeNotMe, that actually depends on the implementation. It's quite legal to provide `long double` as the same type as `double`. – paxdiablo Nov 03 '11 at 05:24
  • @pax: ITYM same range, not the same type (that would break overloading). – MSalters Nov 03 '11 at 10:34
  • Yes, same range/precision/properties, not same _type._ I think it has to be IEEE754 extended, some other extended better than double and capable of representing all the special values, or same as double, in that order. Re the other comment that this is a comment rather than an answer, I don't agree with that. "Why would you need ..." is a bit presumptious :-) but the suggestion to use `long double` is a genuine answer. – paxdiablo Nov 03 '11 at 11:37
0

If you just to do simple operations like addition, multiplication, derivation there is no need to use a third party library. You could create your own class that handle such numbers and the operations you want.

From wikipedia's article on scientific notation :

Scientific notation is a way of writing numbers that accommodates values too large or small to be conveniently written in standard decimal notation. Scientific notation has a number of useful properties and is commonly used in calculators, and by scientists, mathematicians, doctors, and engineers.

In scientific notation all numbers are written like this:

a \times 10^b

("a times ten raised to the power of b"), where the exponent b is an integer, and the coefficient a is any real number

So for your class you need a double corresponding to the coefficient a and an int or long int (for even larger number) that represents the exponent b.

Arithmetic Operations in Scientific Notation

Let two numbers N1 = a1E+b1 and N2 = a2E+b2

Then we can handle the four classical arithmetic operations as following:

Multiplication

N1*N2 = a1*a2E+(b1+b2)

Division

N1/N2 = a1/a2E+(b1-b2)

Of course you should handle division by zero.

Addition

You need some basic algebra to generalize it

if (bi >= b2)

N1+N2 = a1E+b1 + a2E+b2 = a1E+b1 + a2E+(b1+b2-b1) = (a1+a2E+(b2-b1))E+b1

else

N1+N2 = a1E+b1 + a2E+b2 = a1E+(b2+b1-b2) + a2E+b2 = (a1E+(b1-b2)+a2E)E+b2

EDIT

You should transform the left part of the above equations to double, then transform it to scientific notation again and apply the multiplication rule, eg

a1+a2E+(b2-b1) = a3E+b3, so

N1+N2 = a3E+b3E+b1 = a3E+(B3+b1)

Subtraction

Similarly to addition we have for b1 >= b2

N1-N2 = (a1-a2E+(b2-b1))E+b1

Implementation Skeleton

You will need the following:

  • A Constructor that has as argument a double, and calculates the exponent and the coefficient.
  • A Constructor that has as arguments the exponent b and the coefficient a
  • Operators for the arithmetic operations you want to support
  • Helper function for printing the number or converting it to scientific notation

A skeleton follows, the actual implementation is very easy:

class MyLargeNumber{
public:
   MyLargeNumber(double d); // from d find a,b and initialize your object
   MyLargeNumber(double a, long int B); // initialize directly

   double a() const; // get the coefficient
   long int b() const; // get the exponent

   // Operators overloading
   MyLargeNumber operator+(const MyLargeNumber &m) const;
   MyLargeNumber operator-(const MyLargeNumber &m) const;
   MyLargeNumber operator*(const MyLargeNumber &m) const;
   MyLargeNumber operator/(const MyLargeNumber &m) const;

   // Helper function
   std::string toString() const;
private:
   double a; // the coefficient
   long int b; // the exponent
}
pnezis
  • 12,023
  • 2
  • 38
  • 38