There is a difference between using a suffix and a cast; 8388608.5000000009f
and (float) 8388608.5000000009
have different values in common C implementations. This code:
#include <stdio.h>
int main(void)
{
float x = 8388608.5000000009f;
float y = (float) 8388608.5000000009;
printf("%.9g - %.9g = %.9g.\n", x, y, x-y);
}
prints “8388609 - 8388608 = 1.” in Apple Clang 11.0 and other implementations that use correct rounding with IEEE-754 binary32 for float
and binary64 for double
. (The C standard permits implementations to use methods other than IEEE-754 correct rounding, so other C implementations may have different results.)
The reason is that (float) 8388608.5000000009
contains two rounding operations. With the suffix, 8388608.5000000009f
is converted directly to float
, so the portion that must be discarded in order to fit in a float
, .5000000009, is directly examined in order to see whether it is greater than .5 or not. It is, so the result is rounded up to the next representable value, 8388609.
Without the suffix, 8388608.5000000009
is first converted to double
. When the portion that must be discarded, .0000000009, is considered, it is found to be less than ½ the low bit at the point of truncation. (The value of the low bit there is .00000000186264514923095703125, and half of it is .000000000931322574615478515625.) So the result is rounded down, and we have 8388608.5 as a double
. When the cast rounds this to float
, the portion that must be discarded is .5, which is exactly halfway between the representable numbers 8388608 and 8388609. The rule for breaking ties rounds it to the value with the even low bit, 8388608.
(Another example is “7.038531e-26”; (float) 7.038531e-26
is not equal to 7.038531e-26f
. This is the only such numeral with fewer than eight significant digits when float
is binary32 and double
is binary64, except of course “-7.038531e-26”.)