12

While searching for some numpy stuff, I came across a question discussing the rounding accuracy of numpy.dot():

Numpy: Difference between dot(a,b) and (a*b).sum()

Since I happen to have two (different) Computers with Haswell-CPUs sitting on my desk, that should provide FMAand everything, I thought I'd test the example given by Ophion in the first answer, and I got a result that somewhat surprised me:

After updating/installing/fixing lapack/blas/atlas/numpy, I get the following on both machines:

>>> a = np.ones(1000, dtype=np.float128)+1e-14
>>> (a*a).sum()
1000.0000000000199999
>>> np.dot(a,a)
1000.0000000000199948

>>> a = np.ones(1000, dtype=np.float64)+1e-14
>>> (a*a).sum()
1000.0000000000198
>>> np.dot(a,a)
1000.0000000000176

So the standard multiplication + sum() is more precise than np.dot(). timeit however confirmed that the .dot() version is faster (but not much) for both float64 and float128.

Can anyone provide an explanation for this?

edit: I accidentally deleted the info on numpy versions: same results for 1.9.0 and 1.9.3 with python 3.4.0 and 3.4.1.

Community
  • 1
  • 1
  • 1
    Interestingly enough, I only get this discrepancy on NumPy 1.9.2, and not NumPy 1.8.2. Both use the blas + lapack (not atlas). With NumPy 1.8.2, the results are identical with dot and sum, suggesting identical rounding events, on NumPy 1.9.2, multiplication + sum() is more precise. – Alex Huszagh Oct 05 '15 at 18:25
  • See http://docs.scipy.org/doc/numpy/release.html#better-numerical-stability-for-sum-in-some-cases – Mark Dickinson Oct 05 '15 at 19:17
  • 1
    Also https://github.com/numpy/numpy/pull/3685, which is where the change in `sum` was implemented. – Mark Dickinson Oct 05 '15 at 19:18
  • Can anyone confirm that ATLAS uses FMA? I know the latest MKL does, should be an interesting comparison. – Daniel Oct 05 '15 at 23:50

1 Answers1

2

It looks like they recently added a special Pairwise Summation to ndarray.sum for improved numerical stability.

From PR 3685, this affects:

all add.reduce calls that go over float_add with IS_BINARY_REDUCE true
so this also improves mean/std/var and anything else that uses sum.

See here for code changes.

Waylon Flinn
  • 19,969
  • 15
  • 70
  • 72
  • A "divide and conquer" algorithm. Yes, that would make sense, thanks. – Johannes Hinrichs Oct 06 '15 at 09:23
  • Does this mean we are actually better of using alternative strategies to compute matrices than integrated functions? That would be ironic, given that numpy purposes is to ease scientific work. – Ando Jurai Jul 11 '16 at 07:22