1

I have got this calculation in my c++ code r=(float)736778906400/100 now obviously the answer should be 7367789064,but compiler returns 7367789056 what is the problem

sasasasa
  • 137
  • 2
  • 8

1 Answers1

5

When you do:

(float)736778906400/100

You first cast 736778906400 to a float and then divide by 100, so you will get multiple rounding errors:

  1. 736778906400 is not exactly representable by a 32-bits float (which is the most likely float size);

  2. the final result of float(736778906400) / 100 is not exactly representable by a 32-bits float.

Representation of 736778906400 as a float

Assuming you are on a standard architecture that uses IEEE 754 32-bits floating point values for float you can only represent exact integer values between -16777216 and +16777216 (2^24, since 32-bits IEEE floating point have a mantissa of 23 bits).

736778906400 falls inside the range [2^39 + 1, 2^40], so the number will be rounded to the nearest multiple of 2^(39 - 23) = 2^16 = 65536, which is 736778911744. You can check this by doing the following:

float x = 736778906400;

A double has a mantissa of 52 bits, so it can exactly stores integer values between -2^53 and 2^53, so you can easily store 736778906400 exactly inside a `double .

See, e.g., https://en.wikipedia.org/wiki/Single-precision_floating-point_format for more details on the rounding values of float.

Division of 736778911744 by 100

100 is exactly representable by a float, so no rounding error here. The problem comes from the rounding at the end of the division algorithm for IEEE 754.

736778911744 / 100 is exactly 7367789117.44, which is within the range [2^32 + 1, 2^33], so the value is rounded to the nearest multiple of 2^(32 - 23) = 2^9 = 512, which is 14390213 * 512 = 7367789056.


Holt
  • 36,600
  • 7
  • 92
  • 139