2

Consider the following function:

def dostuff(n, f):
    array = numpy.arange(0, n)
    for i in range(1, n):                    # Line 1                      
        array[i] = f(array[i-1], array[i])   # Line 2
    return numpy.sum(array)

How can I rewrite the Line 1/Line 2 to make the loop faster in python 3 (without using cython)?

Nader Hisham
  • 5,214
  • 4
  • 19
  • 35
Vincent
  • 57,703
  • 61
  • 205
  • 388
  • Have you taken a look at [`numpy.vectorize`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html)? Docs say it's primarily about convenience, but it might gain a little in some cases. – ShadowRanger Nov 06 '15 at 04:29
  • How would you do that with Python vectorize? – Vincent Nov 06 '15 at 04:36
  • is your function `f` completely a custom function ? or it might be implemented by `numpy` somewhere ? can you share it's implementation – Nader Hisham Nov 06 '15 at 04:41
  • f can be anything taking two integers and returning an integer – Vincent Nov 06 '15 at 04:54
  • Looks to me like the use-case is too simple to really take advantage of `numpy`'s strengths. I've not timed, but maybe something like `sum(f(x, x+1) for x in range(n-1))` will actually be faster? – zehnpaard Nov 06 '15 at 05:58
  • @zehnpaard actually your code is not producing the desired result , as the function needs to accumulate results – Nader Hisham Nov 06 '15 at 06:56
  • @NaderHisham Ah yes you're quite right. – zehnpaard Nov 06 '15 at 07:13

1 Answers1

2

I encourage you to check this question on SO generalized cumulative functions in NumPy/SciPy? , since you want a generalized cumulative function .

also check scipy documentation for the function frompyfunc Here

func = np.frompyfunc(f , 2 , 1)

def dostuff(n,f):
    final_array = func.accumulate(np.arange(0,n), dtype=np.object).astype(np.int)
    return np.sum(final_array)

Example


In [86]:
def f(num1 , num2):
    return num1 + num2

In [87]:
func = np.frompyfunc(f , 2 , 1)

In [88]:
def dostuff(n,f):
    final_array = func.accumulate(np.arange(0,n), dtype=np.object).astype(np.int)
    return np.sum(final_array)

In [108]:
dostuff(15,f)
Out[108]:
560

In [109]:
dostuff(10,f)
Out[109]:
165

Benchmarks


def dostuff1(n, f):
    array = np.arange(0, n)
    for i in range(1, n):                    # Line 1                      
        array[i] = f(array[i-1], array[i])   # Line 2
    return np.sum(array)

def dostuff2(n,f):
    final_array = func.accumulate(np.arange(0,n), dtype=np.object).astype(np.int)
    return np.sum(final_array)

In [126]:
%timeit dostuff1(100,f)
10000 loops, best of 3: 40.6 µs per loop

In [127]:
%timeit dostuff2(100,f)
The slowest run took 4.98 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 23.8 µs per loop
Community
  • 1
  • 1
Nader Hisham
  • 5,214
  • 4
  • 19
  • 35