12

Is there an easy/build-in way to get the element-wise maximum of two (or ideally more) sparse matrices? I.e. a sparse equivalent of np.maximum.

Ivan Chaer
  • 6,980
  • 1
  • 38
  • 48
Maarten
  • 4,549
  • 4
  • 31
  • 36
  • What exactly do you mean by 'element-wise'? If I go to the sparse coo_matrix page I see functions like `arcsin() Element-wise arcsin.` But no `max`. Do you want the largest value in each matrix; largest along some dimension; largest across the set of matrices? – hpaulj Oct 11 '13 at 07:30
  • 3
    No offense, but I think element-wise is pretty unambiguous. In: two matrices A, B with the same dimensions. Out: A matrix C where C[i,j] = max(A[i,j], B[i,j]) – Maarten Oct 11 '13 at 10:10

7 Answers7

5

This did the trick:

def maximum (A, B):
    BisBigger = A-B
    BisBigger.data = np.where(BisBigger.data < 0, 1, 0)
    return A - A.multiply(BisBigger) + B.multiply(BisBigger)
Maarten
  • 4,549
  • 4
  • 31
  • 36
  • Counterexample: `A = coo_matrix([1,1]); B = coo_matrix([0,2]); maximum(A, B).A` gives `array([[0, 2]])`. – Fred Foo Oct 11 '13 at 13:22
3

No, there's no built-in way to do this in scipy.sparse. The easy solution is

np.maximum(X.A, Y.A)

but this is obviously going to be very memory-intensive when the matrices have large dimensions and it might crash your machine. A memory-efficient (but by no means fast) solution is

# convert to COO, if necessary
X = X.tocoo()
Y = Y.tocoo()

Xdict = dict(((i, j), v) for i, j, v in zip(X.row, X.col, X.data))
Ydict = dict(((i, j), v) for i, j, v in zip(Y.row, Y.col, Y.data))

keys = list(set(Xdict.iterkeys()).union(Ydict.iterkeys()))

XmaxY = [max(Xdict.get((i, j), 0), Ydict.get((i, j), 0)) for i, j in keys]
XmaxY = coo_matrix((XmaxY, zip(*keys)))

Note that this uses pure Python instead of vectorized idioms. You can try shaving some of the running time off by vectorizing parts of it.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • Maybe I should have mentioned I tried that before and it crashed my MacBook, because I did not have enough disk space for the swap file. – Maarten Oct 11 '13 at 11:46
1

Here's another memory-efficient solution that should be a bit quicker than larsmans'. It's based on finding the set of unique indices for the nonzero elements in the two arrays using code from Jaime's excellent answer here.

import numpy as np
from scipy import sparse

def sparsemax(X, Y):

    # the indices of all non-zero elements in both arrays
    idx = np.hstack((X.nonzero(), Y.nonzero()))

    # find the set of unique non-zero indices
    idx = tuple(unique_rows(idx.T).T)

    # take the element-wise max over only these indices
    X[idx] = np.maximum(X[idx].A, Y[idx].A)

    return X

def unique_rows(a):
    void_type = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
    b = np.ascontiguousarray(a).view(void_type)
    idx = np.unique(b, return_index=True)[1]
    return a[idx]

Testing:

def setup(n=1000, fmt='csr'):
    return sparse.rand(n, n, format=fmt), sparse.rand(n, n, format=fmt)

X, Y = setup()
Z = sparsemax(X, Y)
print np.all(Z.A == np.maximum(X.A, Y.A))
# True

%%timeit X, Y = setup()
sparsemax(X, Y)
# 100 loops, best of 3: 4.92 ms per loop
Community
  • 1
  • 1
ali_m
  • 71,714
  • 23
  • 223
  • 298
  • 1
    Ah, forget about it - Maarten's solution is many orders of magnitude quicker than mine! – ali_m Oct 12 '13 at 00:32
1

The latest scipy (13.0) defines element-wise booleans for sparse matricies. So:

BisBigger = B>A
A - A.multiply(BisBigger) + B.multiply(BisBigger)

np.maximum does not (yet) work because it uses np.where, which is still trying to get the truth value of an array.

Curiously B>A returns a boolean dtype, while B>=A is float64.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Too bad scipy 13.0 is not in PyPi yet. – Maarten Oct 13 '13 at 02:07
  • I am not sure what `B>=A` does. But if it would do what you'd except it to do and you have two very sparse matrices A and B, you'd get a very dense matrix as a result. – Maarten Oct 13 '13 at 02:21
  • Good point. It explains the warning I get: `"SparseEfficiencyWarning: Comparing sparse matrices using >= and <= is inefficient, try using <, >, or !=, instead."` – hpaulj Oct 13 '13 at 02:35
1

Here is a function that returns a sparse matrix that is element-wise maximum of two sparse matrices. It implements the answer by hpaulj:

def sparse_max(A, B):
    """
    Return the element-wise maximum of sparse matrices `A` and `B`.
    """
    AgtB = (A > B).astype(int)
    M = AgtB.multiply(A - B) + B
    return M

Testing:

A = sparse.csr_matrix(np.random.randint(-9,10, 25).reshape((5,5))) 
B = sparse.csr_matrix(np.random.randint(-9,10, 25).reshape((5,5)))

M = sparse_max(A, B)
M2 = sparse_max(B, A)

# Test symmetry:
print((M.A == M2.A).all())
# Test that M is larger or equal to A and B, element-wise:
print((M.A >= A.A).all())
print((M.A >= B.A).all())
hsxavier
  • 89
  • 3
0
from scipy import sparse
from numpy import array
I = array([0,3,1,0])
J = array([0,3,1,2])
V = array([4,5,7,9])
A = sparse.coo_matrix((V,(I,J)),shape=(4,4))

A.data.max()
9

If you haven't already, you should try out ipython, you could have saved your self time my making your spare matrix A then simply typing A. then tab, this will print a list of methods that you can call on A. From this you would see A.data gives you the non-zero entries as an array and hence you just want the maximum of this.

Greg
  • 11,654
  • 3
  • 44
  • 50
  • Thanks for you answer, but you misunderstood the question. I edited it to make it more clear. – Maarten Oct 11 '13 at 10:15
  • By element wise you mean, of matrix A and B which has the larger maximum element? – Greg Oct 11 '13 at 10:19
  • Then unless your array's are extremely large I would just convert them to arrays 'A.toarray()' and use this method. If they are extremely large and this breaks your computer then I have no idea sorry. – Greg Oct 11 '13 at 10:30
  • This answer doesn't work for getting the element-wise max of two matrices, and even for the max of one matrix it's incorrect: what if the max of `.data` is negative? – Fred Foo Oct 11 '13 at 11:13
0

On the current SciPy you may use the object method maximum():

mM = mA.maximum(mB)
Royi
  • 4,640
  • 6
  • 46
  • 64