np.sum on sparse matrix gives unexpected results

Question

I have this smaller example of my sparse matrix:

data=np.array([-0.05, 0.025, 0.025, -0.05, 0.025, 0.025, 0.9, -2.50, 1.6, 0.9, 1.6, -4.9, 2.4, 2.4, -5.6, 3.2, 3.2, -5.6, 2.4, 0.9, 2.4, -4.9, 1.6, 0.9, 1.6, -2.5])
row=np.array([0,0,0,1,1,1,2,2,2,3,3,3,3,4,4,4,5,5,5,6,6,6,6,7,7,7])  
col=np.array([0,2,6,1,3,7,0,2,3,1,2,3,4,3,4,5,4,5,6,1,5,6,7,0,6,7])

sp_m=csr_matrix(((data, (row,col))))

When I print this it looks like this:

array([[-0.05 ,  0.   ,  0.025,  0.   ,  0.   ,  0.   ,  0.025,  0.   ],
       [ 0.   , -0.05 ,  0.   ,  0.025,  0.   ,  0.   ,  0.   ,  0.025],
       [ 0.9  ,  0.   , -2.5  ,  1.6  ,  0.   ,  0.   ,  0.   ,  0.   ],
       [ 0.   ,  0.9  ,  1.6  , -4.9  ,  2.4  ,  0.   ,  0.   ,  0.   ],
       [ 0.   ,  0.   ,  0.   ,  2.4  , -5.6  ,  3.2  ,  0.   ,  0.   ],
       [ 0.   ,  0.   ,  0.   ,  0.   ,  3.2  , -5.6  ,  2.4  ,  0.   ],
       [ 0.   ,  0.9  ,  0.   ,  0.   ,  0.   ,  2.4  , -4.9  ,  1.6  ],
       [ 0.9  ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  1.6  , -2.5  ]])

So the problem comes when I sum up all the rows, which should give a sum of 0. So when I do:

np.sum(sp_m.toarray(), axis=1)

it gives:

array([ 0.00000000e+00,  0.00000000e+00,  1.11022302e-16, -4.44089210e-16,
        4.44089210e-16,  4.44089210e-16, -3.33066907e-16,  1.11022302e-16])

So, this appears to be happening because numpy sum, after a number of elements adds them in a pair-wise way rather than 1 by 1, so, for example, the last row:

(0.9 + 0.) + (0. + 0.) + (0. + 0.) + (1.6 + (-2.5))
1.1102230246251565e-16

and

0.9 + 0. + 0. + 0. + 0. + 0. + 1.6 + (-2.5)
0.0

are not the same, where they should.

Does anyone know how to overcome this way that numpy operates? My analysis counts on the sum of row elements being equal to 0, so I can't continue unless I solve this.

It appears that you understand that, for example, `1.1102230246251565e-16` is extremely close to zero. The short answer is that you cannot reasonably expect exact results with floating-point arithmetic, by design. The problem is at the hardware level and is an expected consequence of deliberate design decisions. — Karl Knechtel, Nov 17 '21 at 16:03
I need these to be floating points. The values are really close to 0, but since I need exponentials later on in the analysis, the results are not equal to 1 as expected and they throw off my analysis. — Ioanna K., Nov 17 '21 at 16:06
You could have done `sp_m.sum( axis=1)` though it doesn't make a difference in the numerical values. — hpaulj, Nov 17 '21 at 16:27

np.sum on sparse matrix gives unexpected results

0 Answers0