47

Suppose I have a 2d sparse array. In my real usecase both the number of rows and columns are much bigger (say 20000 and 50000) hence it cannot fit in memory when a dense representation is used:

>>> import numpy as np
>>> import scipy.sparse as ssp

>>> a = ssp.lil_matrix((5, 3))
>>> a[1, 2] = -1
>>> a[4, 1] = 2
>>> a.todense()
matrix([[ 0.,  0.,  0.],
        [ 0.,  0., -1.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  2.,  0.]])

Now suppose I have a dense 1d array with all non-zeros components with size 3 (or 50000 in my real life case):

>>> d = np.ones(3) * 3
>>> d
array([ 3.,  3.,  3.])

I would like to compute the elementwise multiplication of a and d using the usual broadcasting semantics of numpy. However, sparse matrices in scipy are of the np.matrix: the '*' operator is overloaded to have it behave like a matrix-multiply instead of the elementwise-multiply:

>>> a * d
array([ 0., -3.,  0.,  0.,  6.])

One solution would be to make 'a' switch to the array semantics for the '*' operator, that would give the expected result:

>>> a.toarray() * d
array([[ 0.,  0.,  0.],
       [ 0.,  0., -3.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  6.,  0.]])

But I cannot do that since the call to toarray() would materialize the dense version of 'a' which does not fit in memory (and the result will be dense too):

>>> ssp.issparse(a.toarray())
False

Any idea how to build this while keeping only sparse datastructures and without having to do a unefficient python loop on the columns of 'a'?

ogrisel
  • 39,309
  • 12
  • 116
  • 125
  • If `d` is a sparse matrix of the same size as `a` you can use `a.multiply(d)`. Perhaps you can make a `d` that's N rows long and loop over N rows of `a` at a time? – mtrw Jul 14 '10 at 19:43
  • 1
    But d is dense and cannot be broadcasted explicitly in memory to satisfy the multiply shape requirements. Looping over a batch is an option but I find this a bit hackish. I would have thought there was a vanilla vectorized / scipy way to do this without a python loop. – ogrisel Jul 14 '10 at 22:15
  • I guess the problem is you want the representation of a (sparse) matrix but the mulitply operation of an array. I think you're going to have to roll your own unfortunately. – mtrw Jul 15 '10 at 19:33
  • 1
    Actually there is `a.multply(d)` that should do exactly that but it does not do the broadcasting as usual. Maybe it's a bug. – ogrisel Jul 17 '10 at 10:16

3 Answers3

50

I replied over at scipy.org as well, but I thought I should add an answer here, in case others find this page when searching.

You can turn the vector into a sparse diagonal matrix and then use matrix multiplication (with *) to do the same thing as broadcasting, but efficiently.

>>> d = ssp.lil_matrix((3,3))
>>> d.setdiag(np.ones(3)*3)
>>> a*d
<5x3 sparse matrix of type '<type 'numpy.float64'>'
 with 2 stored elements in Compressed Sparse Row format>
>>> (a*d).todense()
matrix([[ 0.,  0.,  0.],
        [ 0.,  0., -3.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  6.,  0.]])

Hope that helps!

mitmatt
  • 681
  • 5
  • 3
  • The great thing about this is that it also works when `X` is an `ndarray` or a dense matrix. +1. – Fred Foo Sep 08 '12 at 16:40
  • 6
    This could be further simplified using [`scipy.sparse.diags(d, 0)`](http://docs.scipy.org/doc/scipy-0.16.1/reference/generated/scipy.sparse.diags.html) rather than `lil_matrix` – ali_m Jan 21 '16 at 23:05
28

I think A.multiply(B) should work in scipy sparse. The method multiply does "point-wise" multiplication, not matrix multiplication.

HTH

salma
  • 281
  • 3
  • 2
  • 3
    @K3---rnc the result is dense only if B is dense. If you convert B to any of the sparse formats, it will do the trick. E.g. A.multiply(csc_matrix(B)) – markhor Jan 23 '17 at 18:27
1

Well, here's a simple code that will do what you want. I don't know if it is as efficient as you would like, so take it or leave it:

import scipy.sparse as ssp
def pointmult(a,b):
    x = a.copy()
    for i in xrange(a.shape[0]):
        if x.data[i]:
            for j in xrange(len(x.data[i])):
                x.data[i] *= b[x.rows[i]]
    return x

It only works with lil matrices so you'll have to make some changes if you want it to work with other formats.

Justin Peel
  • 46,722
  • 6
  • 58
  • 80
  • thanks I would have liked to avoid for loops in python however. But maybe there is no way out with the current scipy.sparse classes for this use case. – ogrisel Jul 19 '10 at 17:49