Create a new sparse matrix from the operations of different rows of a given large sparse matrix in python

Question

Lets say , S is the large scipy-csr-matrix(sparse) and a dictionary D with key -> index(position) of the row vector A in S & values -> list of all the indices(positions) of other row vectors l in S. For each row vector in l you subtract A and get the new vector which will be nothing but the new row vector to be updated in the new sparse matrix.

dictionary of form -> { 1 : [4 , 5 ,... ,63] } then have to create a new sparse matrix with....

new_row_vector_1 -> S_vec1 - S_vec4

new_row_vector_2 -> S_vec1 - S_vec5

.

new_row_vector_n -> S_vec1 - S_vec63

where S_vecX is the Xth row vector of matrix S

Check out the pictorial explanation of the above statements

Numpy Example:

>>> import numpy as np
>>> s = np.array([[1,5,3,4],[3,0,12,7],[5,6,2,4],[4,6,6,4],[7,12,5,67]])
>>> s
array([[ 1,  5,  3,  4],
       [ 3,  0, 12,  7],
       [ 5,  6,  2,  4],
       [ 4,  6,  6,  4],
       [ 7, 12,  5, 67]])
>>> index_dictionary = {0: [2, 4], 1: [3, 4], 2: [1], 3: [1, 2], 4: [1, 3, 2]} 
>>> n = np.zeros((10,4)) #sum of all lengths of values in index_dictionary would be the number of rows for the new array(n) and columns remain the same as s.
>>> n
array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])
>>> idx = 0
>>> for index in index_dictionary:
...     for k in index_dictionary[index]:
...             n[idx] = s[index]-s[k]
...             idx += 1
... 
>>> n
array([[ -4.,  -1.,   1.,   0.],
       [ -6.,  -7.,  -2., -63.],
       [ -1.,  -6.,   6.,   3.],
       [ -4., -12.,   7., -60.],
       [  2.,   6., -10.,  -3.],
       [  1.,   6.,  -6.,  -3.],
       [ -1.,   0.,   4.,   0.],
       [  4.,  12.,  -7.,  60.],
       [  3.,   6.,  -1.,  63.],
       [  2.,   6.,   3.,  63.]])

n is what i want.

Demonstrate how you would do this with a small regular `numpy` array. With a good working example we can then worry about whether the sparse matrix needs any special handling. — hpaulj, Dec 11 '16 at 17:49
Thanks for the comment. Please check, added an example around how we can do it in case of numpy arrays. — xlax, Dec 11 '16 at 21:55
Could you give some context as to why you need this as a somewhat higher level? My intuition is that by rethinking the problem a bit it may be mapped to numpy operations much more elegantly. — Eelco Hoogendoorn, Dec 11 '16 at 21:58

score 0 · Answer 1 · edited May 23 '17 at 12:00

Here's a simple demonstration of what I think you are trying to do:

First the numpy array version:

In [619]: arr=np.arange(12).reshape(4,3)
In [620]: arr[[1,0,2,3]]-arr[0] 
Out[620]: 
array([[3, 3, 3],
       [0, 0, 0],
       [6, 6, 6],
       [9, 9, 9]])

Now the sparse equivalent:

In [622]: M=sparse.csr_matrix(arr)

csr implements row indexing:

In [624]: M[[1,0,2,3]]
Out[624]: 
<4x3 sparse matrix of type '<class 'numpy.int32'>'
    with 11 stored elements in Compressed Sparse Row format>
In [625]: M[[1,0,2,3]].A
Out[625]: 
array([[ 3,  4,  5],
       [ 0,  1,  2],
       [ 6,  7,  8],
       [ 9, 10, 11]], dtype=int32)

But not broadcasting:

In [626]: M[[1,0,2,3]]-M[0]
 ....
ValueError: inconsistent shapes

So we can use an explicit form of broadcasting

In [627]: M[[1,0,2,3]]-M[[0,0,0,0]]  # or M[[0]*4]
Out[627]: 
<4x3 sparse matrix of type '<class 'numpy.int32'>'
    with 9 stored elements in Compressed Sparse Row format>
In [628]: _.A
Out[628]: 
array([[3, 3, 3],
       [0, 0, 0],
       [6, 6, 6],
       [9, 9, 9]], dtype=int32)

This may not be the fastest or most efficient, but it's a start.

I found in a previous SO question that M[[1,0,2,3]] indexing is performed with a matrix multiplication, in this case the equivalent of:

idxM = sparse.csr_matrix(([1,1,1,1],([0,1,2,3],[1,0,2,3])),(4,4))
M1 = idxM * M

Sparse matrix slicing using list of int

So my difference expression requires 2 such multiplication along with the subtraction.

We could try a row by row iteration, and building a new matrix from the result, but there's no guarantee that it will be faster. Depending on the arrays, converting to dense and back might even be faster.

=================

I can imagine 2 ways of applying this to the dictionary.

One is to iterate through the dictionary (what order?), perform this difference for each key, collect the results in a list (list of sparse matrices), and use sparse.bmat to join them into one matrix.

Another is to collect two lists of indexes, and apply the above indexed difference just once.

In [8]: index_dictionary = {0: [2, 4], 1: [3, 4], 2: [1], 3: [1, 2], 4: [1, 3, 2]} 
In [10]: alist=[]
    ...: for index in index_dictionary:
    ...:     for k in index_dictionary[index]:
    ...:         alist.append((index, k))       
In [11]: idx = np.array(alist)
In [12]: idx
Out[12]: 
array([[0, 2],
       [0, 4],
       [1, 3],
       [1, 4],
       [2, 1],
       [3, 1],
       [3, 2],
       [4, 1],
       [4, 3],
       [4, 2]])

applied to your dense s:

In [15]: s = np.array([[1,5,3,4],[3,0,12,7],[5,6,2,4],[4,6,6,4],[7,12,5,67]])
In [16]: s[idx[:,0]]-s[idx[:,1]]
Out[16]: 
array([[ -4,  -1,   1,   0],
       [ -6,  -7,  -2, -63],
       [ -1,  -6,   6,   3],
       [ -4, -12,   7, -60],
       [  2,   6, -10,  -3],
       [  1,   6,  -6,  -3],
       [ -1,   0,   4,   0],
       [  4,  12,  -7,  60],
       [  3,   6,  -1,  63],
       [  2,   6,   3,  63]])

and to the sparse equivalent

In [19]: arr= sparse.csr_matrix(s)
In [20]: arr
Out[20]: 
<5x4 sparse matrix of type '<class 'numpy.int32'>'
    with 19 stored elements in Compressed Sparse Row format>
In [21]: res=arr[idx[:,0]]-arr[idx[:,1]]
In [22]: res
Out[22]: 
<10x4 sparse matrix of type '<class 'numpy.int32'>'
    with 37 stored elements in Compressed Sparse Row format>

Create a new sparse matrix from the operations of different rows of a given large sparse matrix in python

1 Answers1