I have the following statement in Pandas that uses the apply
method which can take up to 2 minutes long.
I read that in order to optimize the speed. I should vectorize the statement. My original statement looks like this:
output_data["on_s"] = output_data["m_ind"].apply(lambda x: my_matrix[x, 0] + my_matrix[x, 1] + my_matrix[x, 2]
Where my_matrix
is spicy.sparse matrix. So my initial step was to use the sum
method:
summed_matrix = my_matrix.sum(axis=1)
But then after this point I get stuck on how to proceed.
Update: Including example data
The matrice looks like this (scipy.sparse.csr_matrix):
(290730, 2) 0.3058016922838267
(290731, 2) 0.3390328430763723
(290733, 2) 0.0838999800585995
(290734, 2) 0.0237008960604337
(290735, 2) 0.0116864263235209
output_data["m_ind"]
is just a Pandas series that has come values like so:
97543
97544
97545
97546
97547