A variation on your solution using repeat
, uses [:,None]
.
np.prod(A,axis=1)[:,None]/A
My 1st stab at handling 0s
is:
In [21]: B
array([[ 0.2, 0.4, 0.6],
[ 0. , 0.5, 0.5],
[ 0.6, 0.4, 0.2]])
In [22]: np.prod(B,axis=1)[:,None]/(B+np.where(B==0,1,0))
array([[ 0.24, 0.12, 0.08],
[ 0. , 0. , 0. ],
[ 0.08, 0.12, 0.24]])
But as the comment pointed out; the [0,1] cell should be 0.25.
This corrects that problem, but now has problems when there are multiple 0s in a row.
In [30]: I=B==0
In [31]: B1=B+np.where(I,1,0)
In [32]: B2=np.prod(B1,axis=1)[:,None]/B1
In [33]: B3=np.prod(B,axis=1)[:,None]/B1
In [34]: np.where(I,B2,B3)
Out[34]:
array([[ 0.24, 0.12, 0.08],
[ 0.25, 0. , 0. ],
[ 0.08, 0.12, 0.24]])
In [55]: C
array([[ 0.2, 0.4, 0.6],
[ 0. , 0.5, 0. ],
[ 0.6, 0.4, 0.2]])
In [64]: np.where(I,sum1[:,None],sum[:,None])/C1
array([[ 0.24, 0.12, 0.08],
[ 0.5 , 0. , 0.5 ],
[ 0.08, 0.12, 0.24]])
Blaz Bratanic's epsilon
approach is the best non iterative solution (so far):
In [74]: np.prod(C+eps,axis=1)[:,None]/(C+eps)
A different solution iterating over the columns:
def paulj(A):
P = np.ones_like(A)
for i in range(1,A.shape[1]):
P *= np.roll(A, i, axis=1)
return P
In [130]: paulj(A)
array([[ 0.24, 0.12, 0.08],
[ 0.25, 0.25, 0.25],
[ 0.08, 0.12, 0.24]])
In [131]: paulj(B)
array([[ 0.24, 0.12, 0.08],
[ 0.25, 0. , 0. ],
[ 0.08, 0.12, 0.24]])
In [132]: paulj(C)
array([[ 0.24, 0.12, 0.08],
[ 0. , 0. , 0. ],
[ 0.08, 0.12, 0.24]])
I tried some timings on a large matrix
In [13]: A=np.random.randint(0,100,(1000,1000))*0.01
In [14]: timeit paulj(A)
1 loops, best of 3: 23.2 s per loop
In [15]: timeit blaz(A)
10 loops, best of 3: 80.7 ms per loop
In [16]: timeit zwinck1(A)
1 loops, best of 3: 15.3 s per loop
In [17]: timeit zwinck2(A)
1 loops, best of 3: 65.3 s per loop
The epsilon approximation is probably the best speed we can expect, but has some rounding issues. Having to iterate over many columns hurts the speed. I'm not sure why the np.prod(A[:,mask], 1)
approach is slowest.
eeclo https://stackoverflow.com/a/22441825/901925 suggested using as_strided
. Here's what I think he has in mind (adapted from an overlapping block question, https://stackoverflow.com/a/8070716/901925)
def strided(A):
h,w = A.shape
A2 = np.hstack([A,A])
x,y = A2.strides
strides = (y,x,y)
shape = (w, h, w-1)
blocks = np.lib.stride_tricks.as_strided(A2[:,1:], shape=shape, strides=strides)
P = blocks.prod(2).T # faster to prod on last dim
# alt: shape = (w-1, h, w), and P=blocks.prod(0)
return P
Timing for the (1000,1000) array is quite an improvement over the column iterations, though still much slower than the epsilon
approach.
In [153]: timeit strided(A)
1 loops, best of 3: 2.51 s per loop
Another indexing approach, while relatively straight forward, is slower, and produces memory errors sooner.
def foo(A):
h,w = A.shape
I = (np.arange(w)[:,None]+np.arange(1,w))
I1 = np.array(I)%w
P = A[:,I1].prod(2)
return P