Tiny numerical difference in sum-of-squares depending on numpy procedure call used

Question

I am writing a function which computes sum of squares of errors. x and y are vectors of the same length; y is observation data, x is data computed by my model.

The code is like:

>> res = y.ravel() - x.ravel()
>> np.dot(res.T, res)
>> 1026.7059479504269

>> np.sum(res**2)
>> 1026.7059479504273

Now look at the last two digits. Can anyone tell me what is the reason for it? Both calls should result in the same operations. Does anyone know where this difference comes from?

I just can guess it is about rounding values. So nothing to worry. — Tom-Oliver Heidel, May 29 '15 at 14:19
If I were to guess, I'd say it's how the `.dot` function works, which, while has the same result, would calculate the dot product on the fly, rather than square all the numbers first, and then add them together. While it's pretty inconsequential for the numbers you are dealing with, I'd wager that `.dot` is more accurate than `np.sum(res**2)` — Matthew, May 29 '15 at 14:23
I wonder if `np.sum()` is doing a running-sum, and `np.dot` isn't. You can experiment by changing the order of res (e.g. sort it from low to high for maximum accuracy). Anyway this is only at the 15th significant digit. — smci, May 29 '15 at 14:26
FYI: `res` is a 1D array, so there is no point in transposing it. The "transpose" of a 1D numpy array is itself. You can simply write `np.dot(res, res)`. — Warren Weckesser, May 29 '15 at 14:50
I not too sure that '.dot' is more accurate, '.dot' is a BLAS-call and most of the time optimized for speed. — tillsten, May 29 '15 at 15:13

Warren Weckesser · Answer 1 · 2015-05-29T15:03:34.050

Floating point addition is not associative--the order in which the operations are performed can affect the result. Presumably np.dot(res, res) and np.sum(res**2) perform the operations in different orders.

Here's an example where np.sum(x**2) != np.sum(x[:50]**2) + np.sum(x[50:]**2) (and neither equals np.dot(x, x)):

>>> np.random.seed(1234)
>>> x = np.random.randn(100)
>>> np.sum(x**2)
99.262119361371433
>>> np.sum(x[:50]**2) + np.sum(x[50:]**2)
99.262119361371461
>>> np.dot(x, x)
99.262119361371447

Tiny numerical difference in sum-of-squares depending on numpy procedure call used

1 Answers1