3

I am writing a function which computes sum of squares of errors. x and y are vectors of the same length; y is observation data, x is data computed by my model.

The code is like:

>> res = y.ravel() - x.ravel()
>> np.dot(res.T, res)
>> 1026.7059479504269

>> np.sum(res**2)
>> 1026.7059479504273

Now look at the last two digits. Can anyone tell me what is the reason for it? Both calls should result in the same operations. Does anyone know where this difference comes from?

smci
  • 32,567
  • 20
  • 113
  • 146
mrad
  • 46
  • 5
  • I just can guess it is about rounding values. So nothing to worry. – Tom-Oliver Heidel May 29 '15 at 14:19
  • Probably: http://stackoverflow.com/q/588004/3001761 – jonrsharpe May 29 '15 at 14:21
  • 1
    If I were to guess, I'd say it's how the `.dot` function works, which, while has the same result, would calculate the dot product on the fly, rather than square all the numbers first, and then add them together. While it's pretty inconsequential for the numbers you are dealing with, I'd wager that `.dot` is more accurate than `np.sum(res**2)` – Matthew May 29 '15 at 14:23
  • I wonder if `np.sum()` is doing a running-sum, and `np.dot` isn't. You can experiment by changing the order of res (e.g. sort it from low to high for maximum accuracy). Anyway this is only at the 15th significant digit. – smci May 29 '15 at 14:26
  • FYI: `res` is a 1D array, so there is no point in transposing it. The "transpose" of a 1D numpy array is itself. You can simply write `np.dot(res, res)`. – Warren Weckesser May 29 '15 at 14:50
  • I not too sure that '.dot' is more accurate, '.dot' is a BLAS-call and most of the time optimized for speed. – tillsten May 29 '15 at 15:13

1 Answers1

1

Floating point addition is not associative--the order in which the operations are performed can affect the result. Presumably np.dot(res, res) and np.sum(res**2) perform the operations in different orders.

Here's an example where np.sum(x**2) != np.sum(x[:50]**2) + np.sum(x[50:]**2) (and neither equals np.dot(x, x)):

>>> np.random.seed(1234)
>>> x = np.random.randn(100)
>>> np.sum(x**2)
99.262119361371433
>>> np.sum(x[:50]**2) + np.sum(x[50:]**2)
99.262119361371461
>>> np.dot(x, x)
99.262119361371447
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214