0

I am working with large matrices(up to million X million).I want to column sum each column in a matrix and put the reciprocal of each column sum in the respective column elements where non zero elements are there.I have done two attempts on this but I still want a faster method of computation and since some columns are zero cant do direct np.reciprocal. Here are my attempts:

A=np.array([[0,1,1,1],[0,0,1,0],[0,1,0,0],[0,0,0,0]])
d=sc.shape(A)[0]


V=sc.zeros(d)

sc.sum(A,axis=0,out=V,dtype='int')
with sc.errstate(divide='ignore', invalid='ignore'):

    Vs = sc.true_divide( 1, V )
    Vs[ ~ sc.isfinite( Vs )] = 0  # -inf inf NaN

print Vs

Second attempt:

A=np.array([[0,1,1,1],[0,0,1,0],[0,1,0,0],[0,0,0,0]])
d=sc.shape(A)[0]

V=sc.zeros(d)

sc.sum(A,axis=0,out=V,dtype='int')

for i in range(0,d):
    if V[i]!=0:                       
        V[i]=1/V[i]
print V

Is there a faster way than this?As my running time is very poor. Thanks

edit1: Do you think changing everything to csr sparse matrix format would make it faster?

Code_ninja
  • 117
  • 1
  • 10
  • What's the slow part? The sum? the divide? the testing? For large `d` I expect the iterative to be quite slow. Unless your matrix is very sparse (10% or less) sparse matrices won't help. And the sparse row sum returns a dense matrix. – hpaulj Jun 25 '17 at 19:02

1 Answers1

1

NumPy: Return 0 with divide by zero

discusses various divide by zero options. The accepted answer looks a lot like your first try. But there's a new answer that might (?) be faster

https://stackoverflow.com/a/37977222/901925

In [240]: V=A.sum(axis=0)
In [241]: np.divide(1,V,out=np.zeros(V.shape),where=V>0)
Out[241]: array([ 0. ,  0.5,  0.5,  1. ])

Your example is too small to make meaningful time tests on. I don't have any intuition about the relative speeds (beyond my comment).

A recent SO question pointed out that the out parameter is required with where in the latest release (1.13) but optional in earlier ones.

hpaulj
  • 221,503
  • 14
  • 230
  • 353