I have got this calculation in my c++ code
r=(float)736778906400/100
now obviously the answer should be 7367789064
,but compiler returns 7367789056
what is the problem

- 137
- 2
- 8
-
5Read about the numerical capacity of a float or just use a double. – Jun 09 '17 at 09:21
-
@YvesDaoust oh okay thanks – sasasasa Jun 09 '17 at 09:22
-
Mumble mumble c-style mumble – Mr Lister Jun 09 '17 at 10:05
1 Answers
When you do:
(float)736778906400/100
You first cast 736778906400
to a float
and then divide by 100
, so you will get multiple rounding errors:
736778906400
is not exactly representable by a 32-bitsfloat
(which is the most likelyfloat
size);the final result of
float(736778906400) / 100
is not exactly representable by a 32-bitsfloat
.
Representation of 736778906400 as a float
Assuming you are on a standard architecture that uses IEEE 754 32-bits floating point values for float
you can only represent exact integer values between -16777216
and +16777216
(2^24
, since 32-bits IEEE floating point have a mantissa of 23 bits).
736778906400
falls inside the range [2^39 + 1, 2^40]
, so the number will be rounded to the nearest multiple of 2^(39 - 23) = 2^16 = 65536
, which is 736778911744
. You can check this by doing the following:
float x = 736778906400;
A double
has a mantissa of 52 bits, so it can exactly stores integer values between -2^53
and 2^53
, so you can easily store 736778906400
exactly inside a `double .
See, e.g., https://en.wikipedia.org/wiki/Single-precision_floating-point_format for more details on the rounding values of float.
Division of 736778911744 by 100
100
is exactly representable by a float
, so no rounding error here. The problem comes from the rounding at the end of the division algorithm for IEEE 754.
736778911744 / 100
is exactly 7367789117.44
, which is within the range [2^32 + 1, 2^33]
, so the value is rounded to the nearest multiple of 2^(32 - 23) = 2^9 = 512
, which is 14390213 * 512 = 7367789056
.

- 36,600
- 7
- 92
- 139