Is this expected behavior for float fused-multiply-add?

Question

I have three numbers with precise representation using (32-bit) floats:

x = 16277216, y = 16077216, z = -261692320000000

I expect performing a fused-multiply-add x*y+z to return the mathematically correct value but rounded. The correct mathematical value is -2489344, which need not be rounded, and therefore this should be the output of a fused-multiply-add. But when I perform fma(x,y,z) the result is -6280192 instead. Why?

I'm using rust. Note z is the rounded result of -x*y.

let x: f32 = 16277216.0;
let y: f32 = 16077216.0;
let z = - x * y;
assert_eq!(z, -261692320000000.0 as f32); // pass
let result = x.mul_add(y, z);
assert_eq!(result, -2489344.0 as f32); // fail

println!("x: {:>32b}, {}", x.to_bits(), x);
println!("y: {:>32b}, {}", y.to_bits(), y);
println!("z: {:>32b}, {}", z.to_bits(), z);
println!("result: {:>32b}, {}", result.to_bits(), result);

The output is

x:  1001011011110000101111011100000, 16277216
y:  1001011011101010101000110100000, 16077216
z: 11010111011011100000000111111110, -261692320000000
result: 11001010101111111010100000000000, -6280192

Does this answer your question? [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken) — cafce25, Jan 14 '23 at 17:20
@cafce25 I think not. The inaccuracy `0.1+0.2!=0.3` occurs because these numbers must be rounded when encoded as a float, but the integer values I've listed, in particular the target value `-2489344` need not be rounded. — Joseph Johnston, Jan 14 '23 at 17:28
Print the result of `z` with more digits. It is very likely not 261692320000000.0 -- in your `assert_eq` that decimal number gets rounded to the nearest `f32` which equals `z`. — chtz, Jan 14 '23 at 17:38
Take a look at the value of `z` if you expand it to an `f64` it's -261692323790848.0, the point of the link is you just can't rely on floating point math to be perfectly accurate because it might even change depending on the cpu you're running on. — cafce25, Jan 14 '23 at 17:42
@JakubDóka this is (supposed to be) `round(x*y - round(x*y))` — Joseph Johnston, Jan 14 '23 at 17:51
@cafce @chtz So basically you're both pointing out how `z` as an f64 contains more digits than the f32. Indeed this is the case. So then it seems the compiler is breaking the specification contract, assuming that by preserving the extra precision (probably using f64 under the hood rather than f32) they're helping you... — Joseph Johnston, Jan 14 '23 at 17:54
No, -261692320000000 is not the value in `z` at no point in your program. It is however rounded to that value when you print it with it's implementation of `Display` or `Debug` — cafce25, Jan 14 '23 at 17:56
I see now, so the values printed for `z` differ. The bit value is correct (solving the problem), the integer value is incorrect (misleading me), though I'm not sure why it would print so incorrectly... — Joseph Johnston, Jan 14 '23 at 18:03
@MarkDickinson the bit value of `z` is `-261692323790848` (which is exactly representable), so why would it print `-261692320000000` instead? — Joseph Johnston, Jan 14 '23 at 18:08
@cafce25: The fact that floating-point arithmetic rounds results to representable values does not mean that questions about floating-point arithmetic are duplicates of that question. Questions about specific operations should be explained. — Eric Postpischil, Jan 14 '23 at 18:10
It's simply the default precision for the `Debug`/`Display` implementation of f32, you shouldn't depend or care for those digits anyways. — cafce25, Jan 14 '23 at 18:10

Eric Postpischil · Accepted Answer · 2023-01-14T18:11:27.520

I have three numbers with precise representation using (32-bit) floats:

x = 16277216, y = 16077216, z = -261692320000000

This premise is false. -261,692,320,000,000 cannot be represented exactly in any 32-bit floating-point format because its significand requires 37 bits to represent.

The IEEE-754 binary32 format commonly used for float has 24-bit significands. Scaling the significand of −261,692,320,000,000 to be under 2²⁴ in magnitude yields −261,692,320,000,000 = −15,598,077.7740478515625•2²⁴. As we can see, the significand is not an integer at this scale, so it cannot be represented exactly, and I would not call it precise either. The closest representable value is −15,598,078•2²⁴ = -261,692,323,790,848.

println!("z: {:>32b}, {}", z.to_bits(), z);
…
z: 11010111011011100000000111111110, -261692320000000

Rust is lying; the value of z is not -261692320000000. It may have used some algorithm like rounding to 8 significant digits and using zeros for the rest. The actual value of z is −261,692,323,790,848.

The value of 16,277,216•16,077,216 − 261,692,323,790,848 using ordinary real-number arithmetic is −6,280,192, so that result for the FMA is correct.

The rounding error occurred in let z = - x * y;, where multiplying 16,277,216 and 16,077,216 rounded the real-number-arithmetic result of 261,692,317,510,656 to the nearest value representable in binary32, 261,692,323,790,848.

[There's an open issue with some debate](https://github.com/rust-lang/rust/issues/63171) for how f32 formatting works. The argument given for the currently printed zeroes, as I understand it (I don't agree), is that “actual value” printing would give the misleading appearance of additional significant figures. — Kevin Reid, Jan 14 '23 at 20:08
@KevinReid: They are using an incorrect model of floating-point. The appearance of additional digits is not misleading because it is correct: Per IEEE-754, each floating-point datum that is not a NaN represents one number. It represents exactly that number and not others and is not an approximation. For binary formats, it is not a representation of some small number of decimal digits. It is the floating-point operations, not the numbers, that round their results. Floating-point numbers ought to be treated as exact, because they are. That is the specification. People should follow it. — Eric Postpischil, Jan 14 '23 at 20:22
I agree with you, as it happens, but the argument is only useful on the issue, not directed at me. I'm just providing the link to context. — Kevin Reid, Jan 14 '23 at 22:04

Is this expected behavior for float fused-multiply-add?

1 Answers1