2

Given two sparse scipy matrices A, B I want to compute the row-wise outer product.

I can do this with numpy in a number of ways. The easiest perhaps being

np.einsum('ij,ik->ijk', A, B).reshape(n, -1)

or

(A[:, :, np.newaxis] * B[:, np.newaxis, :]).reshape(n, -1)

where n is the number of rows in A and B.

In my case, however, going through dense matrices eat up way too much RAM. The only option I have found is thus to use a python loop:

sp.sparse.vstack((ra.T@rb).reshape(1,-1) for ra, rb in zip(A,B)).tocsr()

While using less RAM, this is very slow.

My question is thus, is there a sparse (RAM efficient) way to take the row-wise outer product of two matrices, which keeps things vectorized?

(A similar question is numpy elementwise outer product with sparse matrices but all answers there go through dense matrices.)

Thomas Ahle
  • 30,774
  • 21
  • 92
  • 114
  • 1
    What are typical sizes of A and B? – Divakar Jul 18 '19 at 17:49
  • @Divakar I've got 1000,000 rows and 768 columns. The typical non-zeros per column is about 25. – Thomas Ahle Jul 18 '19 at 19:44
  • Iterating over the rows of a sparse matrix is slow, since it requires making a new (1,n) sparse matrix. The @ should be fast enough, The reshape should be ok, since it simplifies the `csr` `indptr`. `vstack` combines the `coo` attributes of all input matrices, which for that many rows could be slow. – hpaulj Jul 18 '19 at 21:04
  • `lil` format has a fast row `view` method, but it would still have to be turned into `csr` format for multiplication. – hpaulj Jul 18 '19 at 21:06
  • So you expect a matrix with 1000,000 rows and 768*768 columns? – hpaulj Jul 18 '19 at 21:11
  • @hpaulj Correct, but only about 600 non-zeros per row. Can I do this with a `lil` matrix? – Thomas Ahle Jul 18 '19 at 21:36

1 Answers1

3

We can directly calculate the csr representation of the result. It's not superfast (~3 seconds on 100,000x768) but may be ok, depending on your use case:

import numpy as np
import itertools
from scipy import sparse

def spouter(A,B):
    N,L = A.shape
    N,K = B.shape
    drows = zip(*(np.split(x.data,x.indptr[1:-1]) for x in (A,B)))
    data = [np.outer(a,b).ravel() for a,b in drows]
    irows = zip(*(np.split(x.indices,x.indptr[1:-1]) for x in (A,B)))
    indices = [np.ravel_multi_index(np.ix_(a,b),(L,K)).ravel() for a,b in irows]
    indptr = np.fromiter(itertools.chain((0,),map(len,indices)),int).cumsum()
    return sparse.csr_matrix((np.concatenate(data),np.concatenate(indices),indptr),(N,L*K))

A = sparse.random(100,768,0.03).tocsr()
B = sparse.random(100,768,0.03).tocsr()

print(np.all(np.einsum('ij,ik->ijk',A.A,B.A).reshape(100,-1) == spouter(A,B).A))

A = sparse.random(100000,768,0.03).tocsr()
B = sparse.random(100000,768,0.03).tocsr()

from time import time
T = time()
C = spouter(A,B)
print(time()-T)

Sample run:

True
3.1073222160339355
Paul Panzer
  • 51,835
  • 3
  • 54
  • 99
  • This appears to be faster than my version. Do you mind explaining a little bit about how it works? – Thomas Ahle Jul 19 '19 at 09:28
  • It works row by row. Finding the nonzeros in A[i] outer B[i] means pairing all nonzeros in A[i] and B[i] -> value = A[i].data[j]*B[i].data[k] at (A[i].indices[j], B[i].indices[k]); the index pair has then to be converted to a linear index. – Paul Panzer Jul 19 '19 at 10:50
  • what I don't get is why its so much faster than my version (3), given they both do a full python loop over all the rows? – Thomas Ahle Jul 19 '19 at 21:01
  • As long as the innermost loop is vectorized looping is not great but not terrible either. So the difference comes down to how the innermost loop is implemented. You are using matmul, a quite general function, while I use code that can only do this special case (outer product of two (sparse) vectors). Also I think your loop creates full sparse matrix objects for each row which I reckon is quite a bit of overhead – Paul Panzer Jul 19 '19 at 21:18