I have this smaller example of my sparse matrix:
data=np.array([-0.05, 0.025, 0.025, -0.05, 0.025, 0.025, 0.9, -2.50, 1.6, 0.9, 1.6, -4.9, 2.4, 2.4, -5.6, 3.2, 3.2, -5.6, 2.4, 0.9, 2.4, -4.9, 1.6, 0.9, 1.6, -2.5])
row=np.array([0,0,0,1,1,1,2,2,2,3,3,3,3,4,4,4,5,5,5,6,6,6,6,7,7,7])
col=np.array([0,2,6,1,3,7,0,2,3,1,2,3,4,3,4,5,4,5,6,1,5,6,7,0,6,7])
sp_m=csr_matrix(((data, (row,col))))
When I print this it looks like this:
array([[-0.05 , 0. , 0.025, 0. , 0. , 0. , 0.025, 0. ],
[ 0. , -0.05 , 0. , 0.025, 0. , 0. , 0. , 0.025],
[ 0.9 , 0. , -2.5 , 1.6 , 0. , 0. , 0. , 0. ],
[ 0. , 0.9 , 1.6 , -4.9 , 2.4 , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 2.4 , -5.6 , 3.2 , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 3.2 , -5.6 , 2.4 , 0. ],
[ 0. , 0.9 , 0. , 0. , 0. , 2.4 , -4.9 , 1.6 ],
[ 0.9 , 0. , 0. , 0. , 0. , 0. , 1.6 , -2.5 ]])
So the problem comes when I sum up all the rows, which should give a sum of 0. So when I do:
np.sum(sp_m.toarray(), axis=1)
it gives:
array([ 0.00000000e+00, 0.00000000e+00, 1.11022302e-16, -4.44089210e-16,
4.44089210e-16, 4.44089210e-16, -3.33066907e-16, 1.11022302e-16])
So, this appears to be happening because numpy sum, after a number of elements adds them in a pair-wise way rather than 1 by 1, so, for example, the last row:
(0.9 + 0.) + (0. + 0.) + (0. + 0.) + (1.6 + (-2.5))
1.1102230246251565e-16
and
0.9 + 0. + 0. + 0. + 0. + 0. + 1.6 + (-2.5)
0.0
are not the same, where they should.
Does anyone know how to overcome this way that numpy operates? My analysis counts on the sum of row elements being equal to 0, so I can't continue unless I solve this.