-2

Could you explain why the difference is 1024 instead of 1000?

int main(void) {
   unsigned long long int x = pow(2,63);
   double y = pow(2,63) - 1000;
   double z = 9223372036854775808.0 - 1000.0;
   printf("%llu\n%f\n%f\n", x,y,z);
}

Output is:

9223372036854775808
9223372036854774784.000000
9223372036854774784.000000
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
rsltgy
  • 87
  • 1
  • 6
  • Every time a float gets involved... – Martin James Jul 21 '18 at 20:04
  • Result does not change if I do substraction on unsigned long long int. unsigned long long int x = pow(2,63) -1000 = 9223372036854774784; – rsltgy Jul 21 '18 at 20:06
  • 1
    this question is repeated again and again. – 0___________ Jul 21 '18 at 23:19
  • @rsltgy: `pow(2,63)` is a double, and so is `pow(2,63)-1000`. The fact that you will later convert it to a `unsigned long long` doesn't change the fact that the subtraction is done with doubles. If you want integer arithmetic, use `(1ULL<<63)-1000`. – rici Jul 22 '18 at 03:06

1 Answers1

1

Because among the floating-pointer numbers representable in type double, 9223372036854774784 happens to be the closest to the mathematically-correct result 9223372036854774808.

Let's inspect the respresentable neighborhood of your 9223372036854774784

#include <float.h>
#include <math.h>
#include <stdio.h>

int main()
{
    double d = 9223372036854774784;
    printf("%lf\n%lf\n%lf\n", nextafter(d, -DBL_MAX), d, nextafter(d, DBL_MAX));
}

On my platform the output is

9223372036854773760.000000
9223372036854774784.000000
9223372036854775808.000000

Which one would you pick? Your implementation decided to go with 9223372036854774784.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • Yeah, its close but not correct . What if I want to get the exact result ? How do I do ? – rsltgy Jul 21 '18 at 20:08
  • 1
    @rsltgy: On your platform you can't get the exact result in `double`. It is not representable in `double`. `9223372036854774784` is as close you you can get. – AnT stands with Russia Jul 21 '18 at 20:10
  • 1
    @rsltgy: If you want an exact match, you'll need to use a floating point type with at least 64 bits available to store the mantissa. A standard double is 64 bits in total, and some of those are used for the sign and exponent, so you actually only have 53 bits available to store the mantissa. You might be able to use `long double` and get the result you desire — otherwise, you need to write your own floating point arithmetic package (or find one on the internet, somewhere). You might need an arbitrary-precision package such as [GMP](https://www.gmplib.org/) or one of its relatives. – Jonathan Leffler Jul 21 '18 at 20:12
  • @JonathanLeffler, thank you for explanation. It still produces same result even if l use long double. But when l calculate pow(2,53) -1000, I got the result that I expected, that is the for 53 for mantissa. Otherwise, I need to use the package. – rsltgy Jul 21 '18 at 20:26
  • 1
    On some platforms, `long double` is the same as `double`. On others, it isn't. You might also need to apply the right suffix to the constants (`L`, or theoretically `l` but don't ever use that as it invites confusion with `1`) to ensure the calculation is done in `long double` and not just `double`. – Jonathan Leffler Jul 21 '18 at 20:29
  • @Ant, You are saying that its not representable in double, but we can represent 64 bits in double, 2^63 - 1 for positives. pow(2,63) - 1000 still can't be representable ? As @ JonathanLeffler thats because of the mantissa and exponent. – rsltgy Jul 21 '18 at 20:30
  • @rsltgy: Try: `#include ` — `#include ` — `int main(void) { long double ld1 = powl(2.0L, 63.0L); long double ld2 = ld1 - 1000.0L; long double ld3 = 9223372036854775808.0L - 1000.0L; printf("%.0Lf\n%.0Lf\n%.0Lf\n", ld1, ld2, ld3); }` which produces `9223372036854775808` — `9223372036854774808` — `9223372036854774808` on a Mac with GCC 8.1.0. Note the careful use of `powl()` and `L` suffixes to the floating point constants. – Jonathan Leffler Jul 21 '18 at 20:38
  • @rsltgy: `double` is not some trivial integer representation. Just because you have 64 bits in `double` does not mean you can represent `2^63-1`. In `double` you trade precision for range. Typical `double` can represent `1.79769e+308`, which is a lot. The price you pay for that huge range is the gaps between the representable numbers. The farther away you get from `0`, the wider these gaps become. And yes, this is because of limited mantissa, which is `53` bits wide in `double`. – AnT stands with Russia Jul 21 '18 at 20:40
  • As an example from the comment by Jonathan Leffler, Microsoft 32 bit and 64 bit C / C++ compilers treat long double the same as double. Microsoft's older 16 bit C compilers implement a proper 80 bit long double. I don't know why support for long doubles was dropped with the later compilers. – rcgldr Jul 22 '18 at 02:29