I suppose the purported “expected behavior” of this program is to add .0001 to a sum initialized to zero 10,000 times, with all arithmetic being mathematically, yielding 1. The actual behavior is to convert the decimal numeral “.0001” to a double (likely an IEEE-754 64-bit binary floating-point), then to convert that value to a float (likely an IEEE-754 32-bit binary floating-point), then to add that float to a sum 10,000 times, using float arithmetic each time. Thus, the actual behavior has potential rounding errors when converting the numeral to a double, when converting the double to a float, and in each addition.
One way to avoid error in this situation is to use integer arithmetic. Instead of setting float x
to .0001, we could set int x
to 1. Similarly, y
would be an int, and we would use all integer arithmetic until we were done with the loop. After obtaining the final sum, then we would convert it to float. Then we have to adjust for the scaling we used that allowed us to use integer arithmetic. Since we were adding 1 instead of .0001, we have to divide the final result by 10000.f
to adjust it. (This technique will not completely avoid error in all situations, and there are other techniques for reducing error in other situations.)
There is no catastrophic cancellation, because there is no cancellation. Cancellation occurs when two numbers are added or subtracted to produce a smaller result (thus, when adding two numbers of opposite sign, such as adding +8 and -6 to get +2, or subtracting two numbers of the same sign, such as subtracting -6 from +8 to get +2). Catastrophic cancellation occurs when the result is much smaller than the original two numbers. In this situation, the value we are working with gets much smaller, but any error that was in the original numbers typically stays the same size, so the error is much larger relative to the value we are working with. E.g., suppose we are supposed to subtract 8 from 8.01 and get .01, but, due to a small error, we have 7.99 in place of 8. Subtracting 7.99 from 8.01 yields .02. This result, .02, is twice the desired result, .01, so the relative error is huge.