Before using Cython, you should optimize your code with Numpy. Here, vectorizing the third and second inner for
loops, yields a x40 speed-up,
In [1]: import numpy as np
...: import itertools
...:
...: # define Numaire and Values functions from the question above
...:
...: def Values2(values, i1, i0=0, numeraire=Numeraire(0.)):
...: factors=len(values.shape)
...: norm=0.5**factors
...: k = np.array(list(itertools.product(np.arange(2), repeat=factors)))
...: for i in np.arange(i1-1, i0-1, -1):
...: j = np.array(list(itertools.product(np.arange(i+1), repeat=factors)))
...: mask_all = j[:,:,np.newaxis] + k.T[np.newaxis, :, :]
...: mask_x, mask_y = np.swapaxes(mask_all, 2, 1).reshape(-1, 2).T
...:
...: values_tmp = values[mask_x, mask_y].reshape((j.shape[0], k.shape[0]))
...: values_tmp = values_tmp.sum(axis=1)
...: values[j[:,0], j[:,1]] = values_tmp*norm*numeraire(i+1, i, j)
...: return values
...:
...: factors = 2
...: i = 60
...: values = lambda : np.ones([i+1]*factors)
...: print values()[(0,)*factors], np.exp(-0.05/12*i)
...:
...: res = Values(values(), i, numeraire=Numeraire(0.05/12))
...: res2 = Values2(values(), i, numeraire=Numeraire(0.05/12))
...: np.testing.assert_allclose(res, res2)
...:
...: %timeit Values(values(), i, numeraire=Numeraire(0.05/12))
...: %timeit Values2(values(), i, numeraire=Numeraire(0.05/12))
...:
1.0 0.778800783071
1 loops, best of 3: 1.26 s per loop
10 loops, best of 3: 31.8 ms per loop
The next step would be to replace the line,
j = np.array(list(itertools.product(np.arange(i+1), repeat=factors)
with it's Numpy equivalent, taken from this answer (not very pretty),
def itertools_product_numpy(some_list, some_length):
return some_list[np.rollaxis(
np.indices((len(some_list),) * some_length), 0, some_length + 1)
.reshape(-1, some_length)]
k = itertools_product_numpy(np.arange(i+1), factors)
this result in an overall x160 speed up and the code runs in 1.2 second on my laptop for i=360
and factors = 2
.
In this last version, I don't think that you will get much speed up, if you port it to Cython, since there is just one loop remaining and it has only ~360 iterations. Rather, some fine-tuned Python/Numpy optimizations should be performed to get a further speed increase.
Alternatively, you can try applying Cython to your original implementation. However because it is based on itertools.product
, which is slow when called repeatedly in a loop, Cython will not help there.