I'm working with a very large sparse matrix multiplication (matmul) problem. As an example let's say:
A is a binary ( 75 x 200,000 ) matrix. It's sparse, so I'm using csc for storage. I need to do the following matmul operation:
B = A.transpose() * A
The output is going to be a sparse and symmetric matrix of size 200Kx200K.
Unfortunately, B is going to be way to large to store in RAM (or "in core") on my laptop. On the other hand, I'm lucky because there are some properties to B that should solve this problem.
Since B is going to be symmetric along the diagonal and sparse, I could use a triangular matrix (upper/lower) to store the results of the matmul operation and a sparse matrix storage format could further reduce the size.
My question is...can numpy or scipy be told, ahead of time, what the output storage requirements are going to look like so that I can select a storage solution using numpy and avoid the "matrix is too big" runtime error after several minutes (hours) of calculation?
In other words, can storage requirements for the matrix multiply be approximated by analyzing the contents of the two input matrices using an approximate counting algorithm?
If not, I'm looking into a brute force solution. Something involving map/reduce, out-of-core storage, or a matmul subdivision solution (strassens algorithm) from the following web links:
A couple Map/Reduce problem subdivision solutions
- http://www.norstad.org/matrix-multiply/index.html
- http://bpgergo.blogspot.com/2011/08/matrix-multiplication-in-python.html
A out-of-core (PyTables) storage solution
A matmul subdivision solution:
- https://en.wikipedia.org/wiki/Strassen_algorithm
- http://facultyfp.salisbury.edu/taanastasio/COSC490/Fall03/Lectures/FoxMM/example.pdf
- http://eli.thegreenplace.net/2012/01/16/python-parallelizing-cpu-bound-tasks-with-multiprocessing/
Thanks in advance for any recommendations, comments, or guidance!