Strange issue with floating point accuracy on ARM64

Question

I'm running into a really strange issue with floating point accuracy on ARM64. I have a very simple piece of C++ code that looks something like this:

float sx = some_float_number_1;
float sy = some_float_number_2;
float ex = some_float_number_3;
float ey = some_float_number_4;
float px = ex;
float py = ey;

float d1 = (ex - sx) * (py - sy);
float d2 = (px - sx) * (ey - sy);

float d = d1 - d2;
float t = (ex - sx) * (py - sy) - (px - sx) * (ey - sy);

//32-bit output: d == t == 0
//64-bit output: d == 0, t != 0

In theory, d is supposed to be equal to t and equal to 0, and that's exactly what happened on 32-bit ARM. But for some odd reason, the output of t is not equal to 0 on 64-bit ARM while d is still correct. I have never seen any bug like this, so I have no idea what could've caused this kind of problem.

EDIT: Some more info

In case you haven't noticed, the output of both d and t are supposed to be 0 since (ex - sx) * (py - sy) is equal to (px - sx) * (ey - sy)
This issue only happen when the fractional part of the input is not equal to 0.
The compiler I'm using is Clang included in Android NDK r15c bundle.

EDIT2: Here's the disassembly

4c: 52933348    mov w8, #0x999a                 // #39322
50: 72a82828    movk    w8, #0x4141, lsl #16
54: b90683e8    str w8, [sp,#1664]
58: 52933348    mov w8, #0x999a                 // #39322
5c: 72a82728    movk    w8, #0x4139, lsl #16
60: b9067fe8    str w8, [sp,#1660]
64: 52933348    mov w8, #0x999a                 // #39322
68: 72a838a8    movk    w8, #0x41c5, lsl #16
6c: b9067be8    str w8, [sp,#1656]
70: 529999a8    mov w8, #0xcccd                 // #52429
74: 72a855e8    movk    w8, #0x42af, lsl #16
78: b90677e8    str w8, [sp,#1652]
7c: bd467be0    ldr s0, [sp,#1656]
80: bd0673e0    str s0, [sp,#1648]
84: bd4677e0    ldr s0, [sp,#1652]
88: bd066fe0    str s0, [sp,#1644]
8c: bd467be0    ldr s0, [sp,#1656]
90: bd4683e1    ldr s1, [sp,#1664]
94: 1e213800    fsub    s0, s0, s1
98: bd466fe1    ldr s1, [sp,#1644]
9c: bd467fe2    ldr s2, [sp,#1660]
a0: 1e223821    fsub    s1, s1, s2
a4: 1e210800    fmul    s0, s0, s1
a8: bd066be0    str s0, [sp,#1640]
ac: bd4673e0    ldr s0, [sp,#1648]
b0: bd4683e1    ldr s1, [sp,#1664]
b4: 1e213800    fsub    s0, s0, s1
b8: bd4677e1    ldr s1, [sp,#1652]
bc: bd467fe2    ldr s2, [sp,#1660]
c0: 1e223821    fsub    s1, s1, s2
c4: 1e210800    fmul    s0, s0, s1
c8: bd0667e0    str s0, [sp,#1636]
cc: bd466be0    ldr s0, [sp,#1640]
d0: bd4667e1    ldr s1, [sp,#1636]
d4: 1e213800    fsub    s0, s0, s1
d8: bd0663e0    str s0, [sp,#1632]
dc: bd467be0    ldr s0, [sp,#1656]
e0: bd4683e1    ldr s1, [sp,#1664]
e4: 1e213800    fsub    s0, s0, s1
e8: bd466fe2    ldr s2, [sp,#1644]
ec: bd467fe3    ldr s3, [sp,#1660]
f0: 1e233842    fsub    s2, s2, s3
f4: bd4673e4    ldr s4, [sp,#1648]
f8: 1e243821    fsub    s1, s1, s4
fc: bd4677e4    ldr s4, [sp,#1652]
100:    1e233883    fsub    s3, s4, s3
104:    1e230821    fmul    s1, s1, s3
108:    1f020400    fmadd   s0, s0, s2, s1
10c:    bd065fe0    str s0, [sp,#1628]

Can you give an example? I mean concrete values for which `d` and `t` "completely wrong" (all 4 inputs and values of `d` and `t`). — geza, Jul 01 '18 at 15:11
It can be any floating point value, as long as the fractional part of the input is not equal to 0. On 64-bit ARM, the value of t is always wrong, but d is always correct. — bigbadbug, Jul 02 '18 at 07:18
So, by "completely wrong", you mean it is not zero, but a small number? If that's the case, then Eric's answer is correct, and your description is very misleading, as `t` is not completely wrong, it is just the usual "floating point is imprecise" thing. — geza, Jul 02 '18 at 08:54
Can you share the disassembly of the 64-bit version? I'm curious what causes the issue exactly. — geza, Jul 02 '18 at 08:58
I suppose "completely wrong" is a bit of an overstatement. But given that `(ex - sx) * (py - sy)` is supposed to be equal to `(px - sx) * (ey - sy)`, I find it very strange that the output of `t` is not equal to 0 on 64-bit ARM. I'm getting the disassembly right now. — bigbadbug, Jul 02 '18 at 09:15

Eric Postpischil · Accepted Answer · 2018-07-02T11:01:56.817

5

The C++ standard allows an implementation to evaluate floating-point expressions with more precision than the type nominally has. It requires implementations to discard the excess precision when a value is assigned to an object.

Thus, in assigning to d1 and d2, the excess precision is discarded and does not contribute to d1 - d2, but, in (ex - sx) * (py - sy) - (px - sx) * (ey - sy), the excess precision participates in the evaluation. Note that C++ not only allows excess precision in the evaluation but allows it to be used for parts of the expression and not others.

In particular, a common way to evaluate an expression like a*b - c*d is to compute -c*d with a multiply instruction (that does not use excess precision) and then compute a*b - c*d with a fused multiply-add instruction that uses effectively infinite precision for the multiplication.

Your compiler may have a switch to disable this behavior and always use the nominal precision.

edited Jul 02 '18 at 11:01

answered Jul 01 '18 at 15:06

Eric Postpischil

195,579
13
168
312

But `(ex - sx) * (py - sy)` is supposed to be equal to `(px - sx) * (ey - sy)` since `px` and `py` are assigned the same value as `ex` and `ey` respectively. Why would the output of `(ex - sx) * (py - sy)` not be the same as `(px - sx) * (ey - sy)` when place in one single expression? Btw, I'm using Clang and it does not seem to have an option to disable excess floating point precision. – bigbadbug Jul 02 '18 at 08:51
@bigbadbug: C++ not only allows an implementation to evaluate floating-point expressions with more precision but allows it to use it at some places in the expression and not others. Thus, in `(ex - sx) * (py - sy) - (px - sx) * (ey - sy)`, one of the products may be computed with a simple multiply instruction (which does not use extra precision) while the remaining product is then computed and added/subtracted to the first product with a fused multiply-add or fused multiply-subtract instruction, which uses extra precision. – Eric Postpischil Jul 02 '18 at 11:00
@EricPostpischil you are right, I should say that `a + b == a + b` may be false ([ref](https://stackoverflow.com/questions/24442725/is-floating-point-addition-commutative-in-c)) so certainly it should not be expected that the two different multiplications in bigbadbug's comment should be equal – M.M Jul 02 '18 at 11:27
@M.M: because of your (now deleted) previous comment, I've asked this question: https://stackoverflow.com/questions/51134021/floating-point-equality. It seems that C++ standard doesn't guarantee that `if (a==a)` is true. It is only true, if the implementation follows IEEE-754 (which is mostly true nowadays, I suppose) – geza Jul 02 '18 at 12:04

geza · Answer 2 · 2018-07-02T17:33:09.547

Analyzing the disassembly you gave, I think I found the reason: fused-multiply-add (it's the fmadd instruction). I haven't analyzed the disassembly fully, but I think that t is evaluated like this:

t = fma((ex - sx) * (py - sy), sx - px, ey - sy);

where the meaning of fma is:

fma(a, b, c) = a + b*c;

So, the calculation differs a little bit, because fma rounds only once while evaluating a + b*c (without fma, there is a rounding at x=b*c, thena+x)

You have some compiler switch which turns on this feature, like -ffast-math or -ffp-contract=on.

Turn off FMA, and I think your problem will be solved.

(Now I see that Eric answered the same thing, but I leave my answer here, maybe it helps a little bit to understand the issue)

Strange issue with floating point accuracy on ARM64

2 Answers2