Interchangeability of IEEE 754 floating-point addition and multiplication

Question

Is the addition x + x interchangeable by the multiplication 2 * x in IEEE 754 (IEC 559) floating-point standard, or more generally speaking is there any guarantee that case_add and case_mul always give exactly the same result?

#include <limits>

template <typename T>
T case_add(T x, size_t n)
{
    static_assert(std::numeric_limits<T>::is_iec559, "invalid type");

    T result(x);

    for (size_t i = 1; i < n; ++i)
    {
        result += x;
    }

    return result;
}

template <typename T>
T case_mul(T x, size_t n)
{
    static_assert(std::numeric_limits<T>::is_iec559, "invalid type");

    return x * static_cast<T>(n);
}

Note that there seem to be many ways to sum n*x, but surprisingly so many are equivalent! This is somehow related to http://stackoverflow.com/questions/21690585/is-3xx-always-exact and http://stackoverflow.com/questions/21676955/floating-point-product-expansion-equivalence — aka.nice, Oct 08 '16 at 08:12

Simon Byrne · Accepted Answer · 2016-10-04T19:06:13.937

Is the addition x + x interchangeable by the multiplication 2 * x in IEEE 754 (IEC 559) floating-point standard

Yes, since they are both mathematically identical, they will give the same result (since the result is exact in floating point).

or more generally speaking is there any guarantee that case_add and case_mul always give exactly the same result?

Not generally, no. From what I can tell, it seems to hold for n <= 5:

n=3: as x+x is exact (i.e. involves no rounding), so (x+x)+x only involves one rounding at the final step.
n=4 (and you're using the default rounding mode) then
- if the last bit of x is 0, then x+x+x is exact, and so the results are equal by the same argument as n=3.
- if the last 2 bits are 01, then the exact value of x+x+x will have last 2 bits of 1|1 (where | indicates the final bit in the format), which will be rounded up to 0|0. The next addition will give an exact result |01, so the result will be rounded down, cancelling out the previous error.
- if the last 2 bits are 11, then the exact value of x+x+x will have last 2 bits of 0|1, which will be rounded down to 0|0. The next addition will give an exact result |11, so the result will be rounded up, again cancelling out the previous error.
n=5 (again, assuming default rounding): since x+x+x+x is exact, it holds for the same reason as n=3.

For n=6 it fails, e.g. take x to be 1.0000000000000002 (the next double after 1.0), in which case 6x is 6.000000000000002 and x+x+x+x+x+x is 6.000000000000001

Wow your proof is much shorter than my proof by case analysis on whether 3*x is in the same binade as 2*x or 4*x. — Pascal Cuoq, Oct 04 '16 at 16:19
From what I can tell based on exhaustive tests with IEEE-754 `binary32` (= `float`), multiplication by *n* is identical to *(n-1)* repeated additions when *n* ≤ 5 for `FE_TONEAREST`, *n* ≤ 3 for `FE_TOWARDZERO`, *n* ≤ 3 for `FE_DOWNWARD`, and *n* ≤ 3 for `FE_UPWARD`. Is the proof extendable to directed rounding modes? — njuffa, Oct 04 '16 at 17:52
@njuffa The argument for n<=3 was independent of the choice of rounding mode. The proof became dependent on the rounding mode at the n=4 step. — Patricia Shanahan, Oct 04 '16 at 18:33
@PatriciaShanahan Thanks for pointing that out, I had overlooked that the answer already addresses the issue of rounding mode. — njuffa, Oct 04 '16 at 18:35
It would be great if the answer would include the cases for non-default rounding modes mentioned by @njuffa. — plasmacel, Oct 07 '16 at 17:14

Mark B · Answer 2 · 2016-10-04T15:24:34.960

3

If n is for example pow(2, 54) then the multiplication will work just fine, but in the addition path once the result value is sufficiently larger than the input x, result += x will yield result.

edited Oct 04 '16 at 15:24

answered Oct 04 '16 at 15:18

Mark B

95,107
10
109
188

score 1 · Answer 3 · answered Oct 04 '16 at 15:27

Yes, but it doesn't hold generally. Multiplication by a number higher than 2 might not give the same results, as you have changed the exponent and can drop a bit if you replace with adds. Multiplication by two can't drop a bit if replaced by add operations, however.

score 1 · Answer 4 · answered Oct 04 '16 at 15:32

If the accumulator result in case_add becomes too large, adding x will introduce rounding errors. At a certain point, adding x won't have an effect at all. So the functions won't give the same result.

For example if double x = 0x1.0000000000001p0 (hexadecimal float notation):

n  case_add              case_mul

1  0x1.0000000000001p+0  0x1.0000000000001p+0
2  0x1.0000000000001p+1  0x1.0000000000001p+1
3  0x1.8000000000002p+1  0x1.8000000000002p+1
4  0x1.0000000000001p+2  0x1.0000000000001p+2
5  0x1.4000000000001p+2  0x1.4000000000001p+2
6  0x1.8000000000001p+2  0x1.8000000000002p+2

Interchangeability of IEEE 754 floating-point addition and multiplication

4 Answers4