Why is the result of the C program showing a different result than expected?

Question

When I run this program:

#include <stdio.h>
int main (void)
{
    float x;
    double y;
    x = - 2147483645.0;
    y = -2147483645.0f; 
    printf("%f, %f", x, y);
return 0;
}

the result is -2147483648.000000, -2147483645.000000

Why is it so?`

A `float` has less bits for it's mantissa (significant digits) than a `double`. — Fiddling Bits, Jul 24 '20 at 13:47
Here is a good paper: https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html — Chris Taylor, Jul 24 '20 at 13:50
BTW you have the data type specifiers the wrong way round. A decimal value is `double` so it is the `float` initialiser that needs the `f` suffix. — Weather Vane, Jul 24 '20 at 13:55
Does this answer your question? [What is the difference between float and double?](https://stackoverflow.com/questions/2386772/what-is-the-difference-between-float-and-double) — the busybee, Jul 24 '20 at 14:03

score 4 · Answer 1 · answered Jul 24 '20 at 13:49

4

Floating points are imprecise, and having a larger size means more precision. In this case, a double is large enough to precisely store the number, but a float isn't, which means it prints out as the wrong value. Read more here.

answered Jul 24 '20 at 13:49

Aplet123

33,825
1
29
55

Wrong implies an error has occurred. The value isn't wrong. It's the closest float. – stark Jul 24 '20 at 14:08

fcdt · Answer 2 · 2020-07-25T11:21:48.560

The value 2147483645.0 would be 1.111111111111111111111111111101∙2³⁰ in binary form, so it needs a 30 bit mantissa. But the float data type offers only a 23 bit mantissa while double has around 52 bits. The sign is saved separately (this also depends on your plattform and your compiler, this values are for standard x86).

Consider this program:

#include <stdio.h>

int main() {
  float x = -2147483645.0;
  double y = -2147483645.0;

  printf("%f  %X\n", x, *((unsigned*) &x));
  printf("%f  %X%X\n", y, *( ((unsigned*) &y)+1), *((unsigned*) &y));
}

I compiled it with gcc 5.4.0 for x86 and get as output:

-2147483648.000000  CF000000
-2147483645.000000  C1DFFFFFFF400000

The internal format of the numbers in hexadecimal notation can be seen on the right:

float x (32 bits in total):
===========================
Sign:     1
Exponent: 100 1111 0 (bias 127 + 31)
Mantissa: 000 0000 0000 0000 0000 0000

double y (64 bits in total):
============================
Sign:     1
Exponent: 100 0001 1111 (bias 1023 + 32)
Mantissa: 1111 1111 1111 1111 1111 1111 1111 0100 0000 0000 0000 0000 0000

I have grouped the numbers here as in the output. The double y stores exactly the binary representation of the number as described above. In contrast, the mantissa is zero for the float x. This is because the bits are not simply cut off. Instead, the value is rounded depending on the excess bits. That's why you got 1.0∙2³¹=2147483648 as in the output.

You can also try this out on sites like these.

The rounding is done by the c preprocessor here. I don't know a way to influence this, but you can control the rounding mode within the programm, as mentioned here:

#include <stdio.h>
#include <fenv.h>
#pragma STDC FENV_ACCESS ON

int main() {
  float x;
  double y = -2147483645.0;
  
  fesetround(FE_TONEAREST);
  x = y;
  printf("FE_TONEAREST:  %f %X\n", x, *((unsigned*) &x));
  
  fesetround(FE_UPWARD);
  x = y;
  printf("FE_UPWARD:     %f %X\n", x, *((unsigned*) &x));
  
  fesetround(FE_DOWNWARD);
  x = y;
  printf("FE_DOWNWARD:   %f %X\n", x, *((unsigned*) &x));
  
  fesetround(FE_TOWARDZERO);
  x = y;
  printf("FE_TOWARDZERO: %f %X\n", x, *((unsigned*) &x));
}

Compile with -lm option. This outputs

FE_TONEAREST:  -2147483648.000000 CF000000
FE_UPWARD:     -2147483520.000000 CEFFFFFF
FE_DOWNWARD:   -2147483648.000000 CF000000
FE_TOWARDZERO: -2147483520.000000 CEFFFFFF

So if it uses a 23 bit mantissa then I presume it would save the first 23 bits and each of them would be 1. And when we transform that number (1.11111111111111111111111 * 2^30) into its decimal equivalent we would get 2147483520. Why isn't that the result? — Kingsman54321, Jul 24 '20 at 16:09
I've added some further explaination to the post: The excess bits aren't just cut off. The value is rounded depending on this bits so you got `1.0∙2³¹=2147483648` (and it's `2³¹` instead of `2³⁰`, my mistake) — fcdt, Jul 24 '20 at 18:52
You were right the first time - it is 1.111111111111111111111111111101∙2^30, not 1.111111111111111111111111111101∙2^31. I think I finally got it. Mantissa has 23 bits but if we need more than that, you need to check the 24th bit that you would put if you had a 24 bit mantissa. If that bit is 0, then you do nothing. You have your 23 bit mantissa. But if it is 1, then you need to add 1 to the whole mantissa just like when doing addition with binary numbers. — Kingsman54321, Jul 24 '20 at 20:26
And because of that, my logic is (cosidering the number in my example) that you have: 1.1111 1111 1111 1111 1111 111 + 0.0000 0000 0000 0000 0000 001 and that is 10.00000000000000000000000 * 2^30 or 2^31 and that is our number. You actually need to add 1 to the whole mantissa including the hidden bit. — Kingsman54321, Jul 24 '20 at 20:33
After doing some further research, I can say that the rule I mentioned in the previous comment doesn't work. It does somehow round the bits in mantissa but I don't get the rule. Any idea how rounding really works? — Kingsman54321, Jul 25 '20 at 09:17
The rule seems to be "round to nearest" here, but this is done by your compilers preprocessor. I've added another code example to the post. — fcdt, Jul 25 '20 at 11:24

Why is the result of the C program showing a different result than expected?

2 Answers2