The Cases of 0.8−0.7
In 0.8-0.7 == 0.1
, none of the literals are exactly representable in double
. The nearest representable values are 0.8000000000000000444089209850062616169452667236328125 for .8, 0.6999999999999999555910790149937383830547332763671875 for .7, and 0.1000000000000000055511151231257827021181583404541015625 for .1. When the first two are subtracted, the result is 0.100000000000000088817841970012523233890533447265625. As this is not equal to the third, 0.8-0.7 == 0.1
evaluates to false.
In (float)(0.8-0.7) == (float)(0.1)
, the result of 0.8-0.7
and 0.1
are each converted to float
. The float
value nearest to the former, 0.1000000000000000055511151231257827021181583404541015625, is 0.100000001490116119384765625. The float
value nearest to the latter, 0.100000000000000088817841970012523233890533447265625, is 0.100000001490116119384765625. Since these are the same, (float)(0.8-0.7) == (float)(0.1)
evaluates to true.
In (double)(0.8-0.7) == (double)(0.1)
, the result of 0.8-0.7
and 0.1
are each converted to double
. Since they are already double
, there is no effect, and the result is the same as for 0.8-0.7 == 0.1
.
Notes
The C# specification, version 5.0 indicates that float
and double
are the IEEE-754 32-bit and 64-bit floating-point types. I do not see it explicitly state they are the binary floating-point formats rather than decimal formats, but the characteristics described make this evident. The specification also states that IEEE-754 arithmetic is generally used, with round-to-nearest (presumably round-to-nearest-ties-to-even), subject to the exception below.
The C# specification allows floating-point arithmetic to be performed with more precision than the nominal type. Clause 4.1.6 says “… Floating-point operations may be performed with higher precision than the result type of the operation…” This can complicate analysis of floating-point expressions in general, but it does not concern us in the instance of 0.8-0.7 == 0.1
because the only applicable operation is the subtraction of 0.7
from 0.8
, and these numbers are in the same binade (have the same power of two in the floating-point representation), so the result of the subtraction is exactly representable and additional precision will not change the result. As long as the conversion of the source texts 0.8
, 0.7
, and 0.1
to double
does not use extra precision and the cast to float
produces a float
with no extra precision, the results will be as stated above. (The C# standard says in clause 6.2.1 that a conversion from double
to float
yields a float
value, although it does not explicitly state that no extra precision may be used at this point.)
Additional Cases
In 8-0.7 == 7.3
, we have 8 for 8
, 7.29999999999999982236431605997495353221893310546875
for 7.3
, 0.6999999999999999555910790149937383830547332763671875 for 0.7
, and 7.29999999999999982236431605997495353221893310546875 for 8-0.7
, so the result is true.
Note that the additional precision allowed by the C# specification could affect the result of 8-0.7
. A C# implementation that used extra precision for this operation could produce false for this case, as it would get a different result for 8-0.7
.
In 18.01-0.7 == 17.31
, we have 18.010000000000001563194018672220408916473388671875 for 18.01
, 0.6999999999999999555910790149937383830547332763671875
for 0.7
, 17.309999999999998721023075631819665431976318359375
for 17.31
, and 17.31000000000000227373675443232059478759765625
for 18.01-0.7
, so the result is false.
How is subtracting 8 difference from subtracting 18.01 if they both are subtracted by a floating point number?
18.01 is larger than 8 and requires a greater power of two in its floating-point representation. Similarly, the result of 18.01-0.7
is larger than that of 8-0.7
. This means the bits in their significands (the fraction portion of the floating-point representation, which is scaled by the power of two) represent greater values, causing the rounding errors in the floating-point operations to be generally greater. In general, a floating-point format has a fixed span—there is a fixed distance from the high bit retained to the low bit retained. When you change to numbers with more bits on the left (high bits), some bits on the right (low bits) are pushed out, and the results change.