double representing values upto 16 significant digits

Question

I was working on reading values in variables from a byte positioned file and I had to represent some value read into decimal with 6 digits representing fractional part, total no. of digits 20.

So the value could be for example be 99999999999999999999 (20 9s) and it is to be used as float considering the last six 9s representing fractional part.

Now when I was tring to do it with the method employed:

#include<stdio.h>
#include<stdlib.h>

int main(void)
{
   char   c[]="999999999.999999";  //9 nines to the left of decimal and 6 nines to the right
   double d=0;

   sscanf(c,"%lf",&d);

   printf("%f\n",d);
     return 0;
}

OUTPUT:999999999.999999 same as input

Now I increased the number of 9s to the left of decimal by 1 (making 10 nines to the left of decimal) the output became 9999999999.999998.

On further increase of one more 9 to the left of decimal the outcome became rounded off to 100000000000.000000

For my usage it is possible that values with 14 digits to the left of decimal and 6 to the right of it can come in the variable - I want it to be converted precisely just like the input itself without and truncation or rounding off. Also I read somewhere that double can be used to represent a value with up to 16 significant digits but here when I used only 9999999999.999999 (10 nines to the left and 6 to the right) it produced outcome as 9999999999.999998 which contradicts this *represent a value with up to 16 significant digits` statement.

What should be done in this case?

Check the number of bits in the floating point representation. You've probably gone beyond the accuracy it can maintain for a decimal value converted to binary. What you read said *up to* 16 digits but there may be cases where it's just 15. — lurker, Jul 04 '20 at 17:57
@lurker Can you please provide any credible link to read about this — Agrudge Amicus, Jul 04 '20 at 18:02
Does this answer your question? [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken) — KamilCuk, Jul 04 '20 at 18:02
@KamilCuk But last 6 digits are used for representing fractional part - integer can't be used for that. — Agrudge Amicus, Jul 04 '20 at 18:06
As much as you might wish otherwise, [`double` only stores ~15 decimal digits](https://en.wikipedia.org/wiki/Double-precision_floating-point_format). — tadman, Jul 04 '20 at 18:13
@AgrudgeAmicus [If you consider wikipedia to be credible](https://en.wikipedia.org/wiki/IEEE_754#Basic_and_interchange_formats) then the exact number is 15.95. And if you follow [the link for binary64](https://en.wikipedia.org/wiki/Double-precision_floating-point_format#IEEE_754_double-precision_binary_floating-point_format:_binary64), there's more detail about what that number actually means. But the point is moot. A 64-bit number, whether it's a `double` or a `uint64_t`, cannot represent a 20 digit number without truncation or rounding. — user3386109, Jul 04 '20 at 18:28
@AgrudgeAmicus ***C++ Primer 5th Edition** page 62* The exact quote is: "The `float` and `double` types typically yield about 7 and 16 significant digits, respectively." Note well the use of "typically" and "about". In fact, [the standard-required definitions `FLT_DIG` and `DBL_DIG` are almost certainly 6 and 15 on your system](https://stackoverflow.com/questions/9999221/double-precision-decimal-places), so it would be accurate to say your source isn't fully correct. — Andrew Henle, Jul 04 '20 at 19:13

Schwern · Accepted Answer · 2020-07-04T19:12:39.293

The nature of floating point numbers is they are inaccurate. The bigger the number, the more inaccurate.

What should be done in this case?

You could try a long double, but even that's not guaranteed to be precise enough.

long double z = 99999999999999.999999L;
printf("%Lf\n", z);  // 100000000000000.000000

You could store it as an integer and remember it's actually 1,000,000 times smaller. Unfortunately, 20 digits is a bit too large for even an unsigned 64 bit integer. It's 3 bits too large.

#include <inttypes.h>

int main() {
    uint64_t x = 99999999999999999999ULL;
}

test.c:4:18: error: integer literal is too large to be represented in any integer type
    uint64_t x = 99999999999999999999ULL;

You could make a struct that stores the pieces separately.

#include <inttypes.h>
#include <stdbool.h>

typedef struct {
  bool positive;
  uint64_t integer;
  uint64_t decimal;
} bignum;

int main() {
    bignum num = {
        .positive = true,
        .integer = 99999999999999,
        .decimal = 999999
    };
    printf("%s" "%"PRIu64 "." "%"PRIu64 "\n",
        num.positive ? "" : "-", num.integer, num.decimal
    );
}

At which point you're building your own arbitrary-precision arithmetic library. Instead, use an existing one such as GMP. This can store numbers of any size. The trade off is speed and you have to use special types and functions.

#include <gmp.h>

int main() {
    mpf_t y;
    mpf_init_set_str(y, "99999999999999.999999", 10);
    gmp_printf("%Ff\n", y);
}

`uint64_t x = 99999999999999999999UL;` is probably better as `uint64_t x = 99999999999999999999ULL;` (`unsigned long long`) as `long` will be 32 bits on Windows even for 64-bit binaries. — Andrew Henle, Jul 04 '20 at 18:59
@AndrewHenle Thanks. It's good to show the proper integer constant. — Schwern, Jul 04 '20 at 19:13

double representing values upto 16 significant digits

1 Answers1