So scikit-learn's DBSCAN takes in sparse matrices, and if the matrix isn't of csr_matrix format, converts it to such. I'd like to parse in a csr_matrix, but then I get this warning:
EfficiencyWarning: Precomputed sparse input was not sorted by data.
How do I create a data-sorted csr_matrix? If I initialize the matrix data-sorted, the matrix automatically index-sorts it:
>>> from scipy.sparse import csr_matrix
>>> x = csr_matrix(([1,2,3],[[3,2,1],[5,2,1]]))
>>> print(x)
(1, 1) 3
(2, 2) 2
(3, 5) 1
I know csr_matrix has a has_sorted_indices
flag, but I'm not sure how to use it. Even if I set it to false, the matrix is still sorted by indices.
Edited: I tried sorted_indices
but it doesn't seem to change anything. I'm not sure if my concept of sorted_indices
is correct? Is it supposed to sort the data from low to high per row?
>>> from scipy.sparse import csr_matrix
>>> x = csr_matrix(([7,3,5,1,6,2], [[0,1,2,0,1,2],[0,0,0,1,1,1]]), shape=(3, 2))
>>> print(x)
(0, 0) 7
(0, 1) 1
(1, 0) 3
(1, 1) 6
(2, 0) 5
(2, 1) 2
>>> x.has_sorted_indices = False
>>> x.sort_indices()
>>> print(x)
(0, 0) 7
(0, 1) 1
(1, 0) 3
(1, 1) 6
(2, 0) 5
(2, 1) 2
What I want (is this possible or no?)
(0, 1) 1
(0, 0) 7
(1, 0) 3
(1, 1) 6
(2, 1) 2
(2, 0) 5
Basically I need this to return True:
out_of_order = graph.data[:-1] > graph.data[1:]
line_change = np.unique(graph.indptr[1:-1] - 1)
line_change = line_change[line_change < out_of_order.shape[0]]
return (out_of_order.sum() == out_of_order[line_change].sum())