Add a bunch of floating-point numbers with JavaScript, what is the error bound on the sum?

Question

When I add a bunch of floating-point numbers with JavaScript, what is the error bound on the sum? What error bound should be used to check if two sums are equal?

In a simple script, I add a bunch of floating-point numbers and compare sums. I notice that sometimes the result is not correct (two sums that should be equal are not). I am pretty weak at numerical analysis, but even after reviewing Is floating point math broken? and What Every Computer Scientist Should Know About Floating-Point Arithmetic and Comparing Floating Point Numbers, 2012 Edition I am confused about how best to compare floating-point sums in JavaScript.

First, I was confused by: The IEEE standard requires that the result of addition, subtraction, multiplication and division be exactly rounded (as if they were computed exactly then rounded to the nearest floating-point number). If JavaScript is based on the IEEE standard, how can 0.1 + 0.2 != 0.3?

I think I answered this for myself: It's easier for me to think about an example in base 10. If 1/3 is approximated 0.333...333 and 2/3 is approximated 0.666...667, 1/3 + 1/3 = 0.666...666 is exactly rounded (it is the exact sum of two approximations) but != 0.666...667. Intermediate results of exactly rounded operations are still rounded, which can still introduce error.

How big is machine epsilon? JavaScript floating-point numbers are apparently 64-bits, and apparently IEEE double precision format machine epsilon is about 1e-16?

When I add a bunch (n) of floating-point numbers (naive summation, without pairwise or Kahan summation), what is the error bound on the sum? Intuitively it is proportional to n. The worst-case example I can think of (again in base 10) is 2/3 - 1/3 - 1/3 + 2/3 - 1/3 - 1/3 + etc. I think each iteration will increment the error term by 1 ULP while the sum remains zero, so both the error term and relative error will grow without bound?

In the section "Errors in Summation" Goldberg is more precise (error term is bounded by n * machine epsilon * sum of the absolute values) but also points out that if the sum is being done in an IEEE double precision format, machine epsilon is about 1e-16, so n * machine epsilon will be much less than 1 for any reasonable value of n (n much less than 1e16). How can this error bound be used to check if two floating-point sums are equal? What relationship between the sums, 1, 1e-16, n, etc. must be true if they are equal?

Another intuition: If the bunch of numbers are all positive (mine are) then although the error term can grow without bound, the relative error will not, because the sum must grow at the same time. In base 10, the worst-case example I can think of (in which the error term grows fastest while the sum grows slowest) is if 1.000...005 is approximated 1.000...000. Repeatedly adding this number will increment the error term by 1/2 ULP (of the summand, 0.000...005) while incrementing the sum by 1 first place unit. The worst relative error is 4.5 ULP (0.000...045, when the sum is 9.000...000) which is (base - 1) / 2 ULP which is 1/2 ULP in base 2?

If two floating-point sums are equal, then their absolute difference must be less than twice the error bound, which is 1 ULP in base 2? So in JavaScript, Math.abs(a - b) < a * 1e-16 + b * 1e-16?

Comparing Floating Point Numbers, 2012 Edition describes another technique for comparing floating-point numbers, also based on relative error. In JavaScript, is it possible to find the number of representable numbers between two floating-point numbers?

`0.3` is not a floating-point number, that's why it does get rounded! — Bergi, Nov 10 '13 at 22:14
My answer covers calculating the error bound in addition. Regarding your question about how “to check if two floating-point sums are equal”, it is impossible to perform any check that correctly determines whether or not two computed sums (computed with rounding errors) are from series whose exact mathematical sums would be equal. You must choose to make the test lax in at least one direction: Either some false negatives (incorrect reports of inequality) or false positives (incorrect reports of equality) must be accepted. — Eric Postpischil, Nov 10 '13 at 22:47
To accept false positives, simply determine whether the distance between the sums is less than or equal to the maximum possible error, which my answer addresses. — Eric Postpischil, Nov 10 '13 at 22:49

Eric Postpischil · Accepted Answer · 2022-01-13T19:12:45.000

The maximum possible error in the sum of n numbers added consecutively is proportional to n², not to n.

The key reason for this is that each addition may have some error proportional to its sum, and those sums keep growing as more additions are made. In the worse case, the sums grow in proportion to n (if you add n x’s together, you get nx). So, in the end, there are n sums that have grown in proportion to n, yielding a total possible error proportional to n².

JavaScript is specified by the ECMA Language Specification, which says that IEEE-754 64-bit binary floating-point is used and round-to-nearest mode is used. I do not see any provision allowing extra precision as some languages do.

Suppose all numbers have magnitude at most b, where b is some representable value. If your numbers have a distribution that can be characterized more specifically, then an error bound tighter than described below might be derived.

When the exact mathematical result of an operation is y, and there is no overflow, then the maximum error in IEEE-754 binary floating-point with round-to-nearest mode is 1/2 ULP(y), where ULP(y) is the distance between the two representable values just above and below y in magnitude (using y itself as the “above” value if it is exactly representable). This is the maximum error because y is always either exactly on the midpoint between two bordering values or is on one side or the other, so the distance from y to one of the bordering values is at most the distance from the midpoint to a bordering value.

(In IEEE-754 64-bit binary, the ULP of all numbers less than 2^-1022 in magnitude is 2^-1074. The ULP of all larger powers of two is 2^-52 times the number; e.g., 2^-52 for 1. The ULP for non-powers of two is the ULP of the largest power of two smaller than the number, e.g., 2^-52 for any number above 1 and below 2.)

When the first two numbers in a series are added, the exact result is at most 2b, so the error in this first addition is at most 1/2 ULP(2b). When the third number is added, the result is at most 3b, so the error in this addition is at most 1/2 ULP(3b). The total error so far is at most 1/2 (ULP(2b) + ULP(3b)).

At this point, the addition could round up, so the partial sum so far could be slightly more than 3b, and the next sum could be slightly more than 4b. If we want to compute a strict bound on the error, we could use an algorithm such as:

Let bound = 0.
For i = 2 to n:
    bound += 1/2 ULP(i*b + bound).

That is, for each of the additions that will be performed, add an error bound that is 1/2 the ULP of the largest conceivable result given the actual values added plus all the previous errors. (The pseudo-code above would need to be implemented extended precision or with rounding upward in order to retain mathematical rigor.)

Thus, given only the number of numbers to be added and a bound on their magnitudes, we can pre-compute an error bound without knowing their specific values in advance. This error bound will grow in proportion to n².

If this potential error is too high, there are ways to reduce it:

Instead of adding numbers consecutively, they can be split in half, and the sums of the two halves can be added. Each of the halves can be recursively summed in this way. When this is done, the maximum magnitudes of the partial sums will be smaller, so the bounds on their errors will be smaller. E.g., with consecutive additions of 1, we have sums 2, 3, 4, 5, 6, 7, 8, but, with this splitting, we have parallel sums of 2, 2, 2, 2, then 4, 4, then 8.
We can sort the numbers and keep the sums smaller by adding numbers that cancel each other out (complementary positive and negative numbers) or adding smaller numbers first.
The Kahan summation algorithm can be employed to get some extended precision without much extra effort.

Considering one particular case:

Consider adding n non-negative numbers, producing a calculated sum s. Then the error in s is at most (n-1)/2 • ULP(s).

Proof: Each addition has error at most 1/2 ULP(x), where x is the calculated value. Since we are adding non-negative values, the accumulating sum never decreases, so it is never more than s, and its ULP is at most the ULP of s. So the n-1 additions produce at most n-1 errors of ULP(s)/2.

Thank you Eric! I now see that both the maximum possible error term and relative error grow without bound (if *n* grows without bound). In my case both *n* and *b* depend on user input, so I think I have 2 options: 1) as the sums are computed, also compute the error bound, or 2) choose a constant error bound and make an informed compromise between precision (false positives) and potential for large *n* (false negatives). Given *n* and *b*, how do I compute the maximum possible relative error? — user916968, Nov 12 '13 at 16:49
e.g. I guess 1e-8 is greater than the maximum possible relative error for *n* less than about 1e8? (False negatives still might not occur when *n* is greater than 1e8 if some of the numbers are representable in binary floating-point, or some rounding up cancels some rounding down?) A maximum possible relative error of 1e-8 is similar to the precision of comparing two floating-point numbers in IEEE single precision format (1 ULP is 2^-23)? So `a < b` becomes `a - b < (a + b) * -1e-8`, `a > b` becomes `a - b > (a + b) * 1e-8`, and `a === b` becomes `Math.abs(a - b) < (a + b) * 1e-8`? — user916968, Nov 12 '13 at 16:54
@user916968: The maximum possible relative error is infinity. One way this can occur is to add numbers `x+y+z` whose exact mathematical sum is zero (because they are a mix of positive and negative) but whose calculated floating-point sum is non-zero (due to rounding error in the first addition). A non-zero result for a zero true result has infinite relative error. Another way error can be unbounded is that, once the partial sum reaches 2**53, adding 1 does not change it. Then you can add 1 arbitrarily many times, so the mathematical sum would be unbounded but the calculated sum is 2**53. — Eric Postpischil, Nov 12 '13 at 16:57
Incidentally, I can think of an example of the worst-case where the error term grows in proportion to *n²*: (base 2, precision 4) 1.000 + 1.001 (= 10.00 error term 0.001) + 10.01 (= 100.0 error term 0.011) + 100.1 (= 1000 error term 0.111) + etc. and I can think of an example where the relative error grows in proportion to *n*: 1.000 + 0.0001000 (= 1.000 relative error 0.0001) + 0.0001000 (= 1.000 relative error 0.001) + 0.0001000 (= 1.000 relative error 0.0011) + etc. but I can't think of an example where the relative error grows in proportion to *n²*? — user916968, Nov 12 '13 at 16:59
Also, how could you tell that the recurrence relation for computing the maximum possible error term (`bound += 1/2 ULP(i*b + bound)`) grows in proportion to *n²*? Experimenting with some values confirms it, but still it's not obvious to me. — user916968, Nov 12 '13 at 17:02
I added information about the particular case of adding non-negative numbers. — Eric Postpischil, Nov 12 '13 at 17:10
The error grows in proportion to n**2 because each partial sum can increase, and the total error is the sum of the errors in each sum. Thus, the partial sums may progress 2, 3, 4, 5, 6, 7…, with error bounds 2•e, 3•e, 4•e, 5•e, 6•e, 7•e,…, for some e, and the total of those error bounds is (2+3+4+5+6+7+…)•e, which grows as n**2. — Eric Postpischil, Nov 12 '13 at 17:13
@EricPostpischil Yes, thank you for saying that clearly. It's too bad I spent time trying to read the answer before reading your comment. Oh, you wrote the answer to... didn't notice that 'til writing this. — Josiah Yoder, Jan 13 '22 at 13:56
Suggestion: Add your last comment as a second sentence in the first paragraph. — Josiah Yoder, Jan 13 '22 at 13:58

Add a bunch of floating-point numbers with JavaScript, what is the error bound on the sum?

1 Answers1

Linked