Python: Inconsistency between sparse matrix multiplication and numpy.dot()

Question

Ubuntu16.04_64bit + Python3.5.2 + numpy1.13.3 + scipy1.0.0 I've got this problem when I'm dealing with the matrix multiplication between a scipy.sparse.csc.csc_matrix and an numpy.ndarray. I will give out an example here:

import numpy as np
import scipy.sparse

a = np.random.random(1000,1000)
b = np.random.random(1000,2000)
da = scipy.sparse.csc.csc_matrix(a)
db = scipy.sparse.csc.csc_matrix(b)

ab = a.dot(b)
dadb = da.dot(db)
dab = da.dot(b)

then the difference looks like this:

In [31]: np.sum(dadb.toarray() != ab)
Out[31]: 1869078

In [33]: np.sum(dab != dadb.toarray())
Out[33]: 0

In [34]: np.sum(dab != ab)
Out[34]: 1869078

Why? What makes the difference between them? What to do with it?

@Woody.Wang `np.allclose(dab, ab, 0, 0)` is testing for perfect equality, but as the answer below points out, you shouldn't expect perfect equality in this case. `np.allclose(dab, ab)` has some tolerance in it to account for that. — Josh Karpel, Dec 25 '17 at 14:42
In my test example `np.einsum('ij,jk', a, b)` matches `dab` (and `dadb`) better than `ab`. So even with the dense arrays, details of evaluation order can make small differences. — hpaulj, Dec 25 '17 at 18:58

jakevdp · Accepted Answer · 2017-12-25T14:42:26.890

5

What you are seeing is typical of floating point arithmetic (for a great explanation, see What Every Computer Scientist Should Know About Floating-Point Arithmetic or the answers to Why Are Floating Point Numbers Inaccurate?). Unlike real arithmetic, the order of operations in floating point arithmetic will (slightly) change the results, because rounding errors accumulate in different ways. What this means is that different ways of computing the same result cannot be expected to agree exactly, but they will agree approximately.

You can see this if you use np.allclose instead of using exact equality:

>>> np.allclose(dab, ab)
True

>>> np.allclose(dadb.toarray(), ab)
True

In short, these operations are behaving as expected.

edited Dec 25 '17 at 14:42

answered Dec 25 '17 at 14:31

jakevdp

77,104
11
125
160

Also worth noting: "As of NumPy 1.7, np.dot is not aware of sparse matrices, therefore using it will result on unexpected results or errors. The corresponding dense array should be obtained first instead..." from [the docs](https://docs.scipy.org/doc/scipy/reference/sparse.html#matrix-vector-product). `dense.dot(sparse)` is generally not a good idea, although the other order is fine. – Josh Karpel Dec 25 '17 at 14:39
In addition, if you really want `dense.dot(sparse)`, you can do `sparse.T.dot(dense.T).T` – Hameer Abbasi Dec 25 '17 at 16:01
In earlier versions, `np.dot` naively tried to make a dense array from the sparse matrix. In newer code `np.dot` delegates the action to the sparse method. `sparse.dot` is its default matrix multiplication. The problems in this question aren't with the use of `dot`. @JoshKarpel – hpaulj Dec 25 '17 at 18:53
Neat! I didn't realize that had been fixed. – Josh Karpel Dec 25 '17 at 20:43

Python: Inconsistency between sparse matrix multiplication and numpy.dot()

1 Answers1