0

When I run this program:

#include <stdio.h>
int main (void)
{
    float x;
    double y;
    x = - 2147483645.0;
    y = -2147483645.0f; 
    printf("%f, %f", x, y);
return 0;
}

the result is -2147483648.000000, -2147483645.000000

Why is it so?`

2 Answers2

4

Floating points are imprecise, and having a larger size means more precision. In this case, a double is large enough to precisely store the number, but a float isn't, which means it prints out as the wrong value. Read more here.

Aplet123
  • 33,825
  • 1
  • 29
  • 55
4

The value 2147483645.0 would be 1.111111111111111111111111111101∙2³⁰ in binary form, so it needs a 30 bit mantissa. But the float data type offers only a 23 bit mantissa while double has around 52 bits. The sign is saved separately (this also depends on your plattform and your compiler, this values are for standard x86).

Consider this program:

#include <stdio.h>

int main() {
  float x = -2147483645.0;
  double y = -2147483645.0;

  printf("%f  %X\n", x, *((unsigned*) &x));
  printf("%f  %X%X\n", y, *( ((unsigned*) &y)+1), *((unsigned*) &y));
}

I compiled it with gcc 5.4.0 for x86 and get as output:

-2147483648.000000  CF000000
-2147483645.000000  C1DFFFFFFF400000

The internal format of the numbers in hexadecimal notation can be seen on the right:

float x (32 bits in total):
===========================
Sign:     1
Exponent: 100 1111 0 (bias 127 + 31)
Mantissa: 000 0000 0000 0000 0000 0000

double y (64 bits in total):
============================
Sign:     1
Exponent: 100 0001 1111 (bias 1023 + 32)
Mantissa: 1111 1111 1111 1111 1111 1111 1111 0100 0000 0000 0000 0000 0000

I have grouped the numbers here as in the output. The double y stores exactly the binary representation of the number as described above. In contrast, the mantissa is zero for the float x. This is because the bits are not simply cut off. Instead, the value is rounded depending on the excess bits. That's why you got 1.0∙2³¹=2147483648 as in the output.

You can also try this out on sites like these.

The rounding is done by the c preprocessor here. I don't know a way to influence this, but you can control the rounding mode within the programm, as mentioned here:

#include <stdio.h>
#include <fenv.h>
#pragma STDC FENV_ACCESS ON

int main() {
  float x;
  double y = -2147483645.0;
  
  fesetround(FE_TONEAREST);
  x = y;
  printf("FE_TONEAREST:  %f %X\n", x, *((unsigned*) &x));
  
  fesetround(FE_UPWARD);
  x = y;
  printf("FE_UPWARD:     %f %X\n", x, *((unsigned*) &x));
  
  fesetround(FE_DOWNWARD);
  x = y;
  printf("FE_DOWNWARD:   %f %X\n", x, *((unsigned*) &x));
  
  fesetround(FE_TOWARDZERO);
  x = y;
  printf("FE_TOWARDZERO: %f %X\n", x, *((unsigned*) &x));
}

Compile with -lm option. This outputs

FE_TONEAREST:  -2147483648.000000 CF000000
FE_UPWARD:     -2147483520.000000 CEFFFFFF
FE_DOWNWARD:   -2147483648.000000 CF000000
FE_TOWARDZERO: -2147483520.000000 CEFFFFFF
fcdt
  • 2,371
  • 5
  • 14
  • 26
  • So if it uses a 23 bit mantissa then I presume it would save the first 23 bits and each of them would be 1. And when we transform that number (1.11111111111111111111111 * 2^30) into its decimal equivalent we would get 2147483520. Why isn't that the result? – Kingsman54321 Jul 24 '20 at 16:09
  • I've added some further explaination to the post: The excess bits aren't just cut off. The value is rounded depending on this bits so you got `1.0∙2³¹=2147483648` (and it's `2³¹` instead of `2³⁰`, my mistake) – fcdt Jul 24 '20 at 18:52
  • You were right the first time - it is 1.111111111111111111111111111101∙2^30, not 1.111111111111111111111111111101∙2^31. I think I finally got it. Mantissa has 23 bits but if we need more than that, you need to check the 24th bit that you would put if you had a 24 bit mantissa. If that bit is 0, then you do nothing. You have your 23 bit mantissa. But if it is 1, then you need to add 1 to the whole mantissa just like when doing addition with binary numbers. – Kingsman54321 Jul 24 '20 at 20:26
  • And because of that, my logic is (cosidering the number in my example) that you have: 1.1111 1111 1111 1111 1111 111 + 0.0000 0000 0000 0000 0000 001 and that is 10.00000000000000000000000 * 2^30 or 2^31 and that is our number. You actually need to add 1 to the whole mantissa including the hidden bit. – Kingsman54321 Jul 24 '20 at 20:33
  • After doing some further research, I can say that the rule I mentioned in the previous comment doesn't work. It does somehow round the bits in mantissa but I don't get the rule. Any idea how rounding really works? – Kingsman54321 Jul 25 '20 at 09:17
  • The rule seems to be "round to nearest" here, but this is done by your compilers preprocessor. I've added another code example to the post. – fcdt Jul 25 '20 at 11:24