I have to compute massive similarity computations between vectors in a sparse matrix. What is currently the best tool, scipy-sparse
or pandas
, for this task?
Asked
Active
Viewed 905 times
2

Saullo G. P. Castro
- 56,802
- 26
- 179
- 234

Igor Medeiros
- 4,026
- 2
- 26
- 32
-
1I think you are looking for `scipy.sparse` http://docs.scipy.org/doc/scipy/reference/sparse.html – Akavall Oct 04 '13 at 01:51
-
Yea, i meant `scipy.sparse` vs `pandas`. The numpy citation was a mistake. – Igor Medeiros Oct 04 '13 at 01:53
-
1`pandas` doesn't handle sparse arrays. `scipy.sparse` handles sparse linear algebra, `pandas` doesn't have any of this functionality, as far as I know. – Joe Kington Oct 04 '13 at 02:05
-
Do you know any alternatives to scipy sparse then? Could be in another language too. – Igor Medeiros Oct 04 '13 at 02:08
-
1Matlab (and Octave) has a good sparse capability. And the ideas for both come from Fortran or C++ implementations. Mostly these were developed with linear algebra problems in mind (i.e. solving linear equations like `A*x = b`, where `A` is a large sparse matrix. – hpaulj Oct 04 '13 at 02:45
-
1pandas has had sparse support for several versions: http://pandas.pydata.org/pandas-docs/dev/sparse.html – Jeff Oct 04 '13 at 02:49
-
See also this answer: http://stackoverflow.com/questions/4623800/is-there-support-for-sparse-matrices-in-python – Felix Zumstein Oct 04 '13 at 06:11
1 Answers
1
After some research I found that both pandas and Scipy have structures to represent sparse matrix efficiently in memory. But none of them have out of box support for compute similarity between vectors like cosine, adjusted cosine, euclidean etc. Scipy support this on dense matrix only. For sparse, Scipy support dot products and others linear algebra basic operations.

Igor Medeiros
- 4,026
- 2
- 26
- 32
-
If you need a dense mattrix, there's always dimensionality reduction (assuming it's the width of the matrix that's causing the matrix's size to be an issue). – wegry Jan 30 '15 at 17:30