I have the following data for a Python program.
import numpy as np
np.random.seed(28)
n = 100000
d = 60
S = np.random.rand(n)
O = np.random.rand(n, d, d)
p = np.random.rand()
mask = np.where(S < 0.5)
And I want to run the following algorithm:
def method1():
sum_p = np.zeros([d, d])
sum_m = np.zeros([d, d])
for k in range(n):
s = S[k] * O[k]
sum_p += s
if(S[k] < 0.5):
sum_m -= s
return p * sum_p + sum_m
This is a minimal example, but the code in method1()
is supposed to be run many times in my project, so I would like to rewrite it in a more pythonic way, to make it as efficient as possible. I have tried with the following method:
def method2():
sall = S[:, None, None] * O
return p * sall.sum(axis=0) - sall[mask].sum(axis=0)
But, although this method performs better with low values of d
, when d=60
it does not provide good times:
# To check that both methods provide the same result.
In [1]: np.sum(method1() == method2()) == d*d
Out[1]: True
In [2]: %timeit method1()
Out[2]: 801 ms ± 2.98 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [3]: %timeit method2()
Out[3]: 1.91 s ± 6.17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Do you have any other ideas to optimize this method?
(As additional information, the variable mask
is supposed to be used in other parts of my final code, so I don't need to consider it inside the code of method2
for the time computation.)