I've tried to initialize csc_matrix
and csr_matrix
from a list of (data, (rows, cols))
values as the documentation suggests.
sparse = csc_matrix((data, (rows, cols)), shape=(n, n))
The problem is that, the method that I actually have for generating the data
, rows
and cols
vectors introduces duplicates for some points. By default, scipy adds the values of the duplicate entries. However, in my case, those duplicates have exactly the same value in data
for a given (row, col)
.
What I'm trying to achieve is to make scipy ignore the second entry if already exists one, instead of adding them.
Ignoring the fact that I could improve the generation algorithm to avoid generating duplicates, is there a parameter or another way of creating a sparse matrix that ignores duplicates?
Currently two entries with data = [4, 4]; cols = [1, 1]; rows = [1, 1];
generate a sparse matrix which value at (1,1)
is 8
while the desired value is 4
.
>>> c = csc_matrix(([4, 4], ([1,1],[1,1])), shape=(3,3))
>>> c.todense()
matrix([[0, 0, 0],
[0, 8, 0],
[0, 0, 0]])
I'm also aware that I could filter them by using a 2-dimensional numpy unique
function, but lists are quite large so this is not really a valid option.
Other possible answer to the question: Is there any way of specifying what to do with duplicates? i.e. keeping the min
or max
instead of the default sum
?