2

Ubuntu16.04_64bit + Python3.5.2 + numpy1.13.3 + scipy1.0.0 I've got this problem when I'm dealing with the matrix multiplication between a scipy.sparse.csc.csc_matrix and an numpy.ndarray. I will give out an example here:

import numpy as np
import scipy.sparse

a = np.random.random(1000,1000)
b = np.random.random(1000,2000)
da = scipy.sparse.csc.csc_matrix(a)
db = scipy.sparse.csc.csc_matrix(b)

ab = a.dot(b)
dadb = da.dot(db)
dab = da.dot(b)

then the difference looks like this:

In [31]: np.sum(dadb.toarray() != ab)
Out[31]: 1869078

In [33]: np.sum(dab != dadb.toarray())
Out[33]: 0

In [34]: np.sum(dab != ab)
Out[34]: 1869078

Why? What makes the difference between them? What to do with it?

halfer
  • 19,824
  • 17
  • 99
  • 186
Woody. Wang
  • 135
  • 1
  • 9
  • 1
    Could you check with `np.allclose()`? – Divakar Dec 25 '17 at 14:18
  • `np.allclose(dab,ab,0,0)` is False. – Woody. Wang Dec 25 '17 at 14:24
  • What about : `np.allclose(dab,ab)`? – Divakar Dec 25 '17 at 14:27
  • @Woody.Wang `np.allclose(dab, ab, 0, 0)` is testing for perfect equality, but as the answer below points out, you shouldn't expect perfect equality in this case. `np.allclose(dab, ab)` has some tolerance in it to account for that. – Josh Karpel Dec 25 '17 at 14:42
  • In my test example `np.einsum('ij,jk', a, b)` matches `dab` (and `dadb`) better than `ab`. So even with the dense arrays, details of evaluation order can make small differences. – hpaulj Dec 25 '17 at 18:58

1 Answers1

5

What you are seeing is typical of floating point arithmetic (for a great explanation, see What Every Computer Scientist Should Know About Floating-Point Arithmetic or the answers to Why Are Floating Point Numbers Inaccurate?). Unlike real arithmetic, the order of operations in floating point arithmetic will (slightly) change the results, because rounding errors accumulate in different ways. What this means is that different ways of computing the same result cannot be expected to agree exactly, but they will agree approximately.

You can see this if you use np.allclose instead of using exact equality:

>>> np.allclose(dab, ab)
True

>>> np.allclose(dadb.toarray(), ab)
True

In short, these operations are behaving as expected.

jakevdp
  • 77,104
  • 11
  • 125
  • 160
  • Also worth noting: "As of NumPy 1.7, np.dot is not aware of sparse matrices, therefore using it will result on unexpected results or errors. The corresponding dense array should be obtained first instead..." from [the docs](https://docs.scipy.org/doc/scipy/reference/sparse.html#matrix-vector-product). `dense.dot(sparse)` is generally not a good idea, although the other order is fine. – Josh Karpel Dec 25 '17 at 14:39
  • In addition, if you really want `dense.dot(sparse)`, you can do `sparse.T.dot(dense.T).T` – Hameer Abbasi Dec 25 '17 at 16:01
  • In earlier versions, `np.dot` naively tried to make a dense array from the sparse matrix. In newer code `np.dot` delegates the action to the sparse method. `sparse.dot` is its default matrix multiplication. The problems in this question aren't with the use of `dot`. @JoshKarpel – hpaulj Dec 25 '17 at 18:53
  • Neat! I didn't realize that had been fixed. – Josh Karpel Dec 25 '17 at 20:43