arrayfun equovalent for numpy, in terms of computation performance

Question

When writing both in Octave and in MATLAB, arrayfun is encouraged, in order to not create brevity in the code, but also speed. This is unlike what is in the discussion of the following node, which only talks about styling, not computation performance.
See Octave code below:

function ret = vect_vs_array_fun(n)
  a=1:n;
  tic;
  for i=1:n
    a(i)=sin(i)/7;
  endfor
  toc;
  a=1:n;
  tic;
  a=arrayfun( @sin, 1:n ) / 7;
  toc;
  a=1:n;
  tic;
  for i=1:n
    a(i)=sin(i)/7;
  endfor
  toc;  
endfunction

When running the function for a large enough value, one can see the runtime difference:

vect_vs_array_fun(100000)
Elapsed time is 0.627594 seconds.
Elapsed time is 0.122411 seconds.
Elapsed time is 0.623537 seconds.

The more nested for loops are replaced by arrayfun, the relative faster it will get.

The question is: is there something equivalent, performance-wise for Python/NumPy, and in particular, its array?

You can apply `sin` directly on vectors in Matlab/Octave: `a=sin(1:n)/7`. `@arrayfun` gives you some brevity for custom functions (though, ideally you'd implement them in a way that they work on vectors directly) — chtz, Jun 08 '20 at 09:34
Note that the context for this question applies *only* to Octave, not Matlab, and it does so because Octave doesn't have a JIT that optimizes `for` loops like Matlab does (and maybe lacks Matlab's "in-place operations" optimization). In Matlab, `arrayfun` is just syntactic sugar for iteration, and actually performs the same as or much slower than `for` loops. (Unless you're using Matlab's GPU acceleration and its dedicated arrayfun variant.) `arrayfun` does *not* vectorize operations. — Andrew Janke, Jun 08 '20 at 14:37
Note: you're cheating with the `arrayfun` construction here, because in the arrayfun version, the `/` is doing a single vectorized operation over the entire array, where the `a(i)=sin(i)/7` are doing a separate division call for each element. The real equivalent would be `a=arrayfun( @(x) sin(x)/7, 1:n );`. — Andrew Janke, Jun 08 '20 at 14:43
@AndrewJanke this is not true. Octave optimises arrayfun when using built-in functions. OP is doing exactly the right thing in the arrayfun call. If they wrapped into an anonymous function they would incur a penalty, effectively making performance equal to a for loop. — Tasos Papastylianou, Jun 11 '20 at 17:59

score 0 · Answer 1 · answered Jun 08 '20 at 08:51

0

Yes, just use numpy functions on arrays:

np.sin(np.arange(1,n))

Speed comparison:

import numpy as np
n = 100000

%timeit np.sin(np.arange(1,n))
#1.02 ms ± 6.42 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

def loop(n):
    for i in range(1,n):
        np.sin(i)

%timeit loop(n)
#107 ms ± 713 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

answered Jun 08 '20 at 08:51

Stef

28,728
2
24
52

Will it help if I am using a Python written procedure? – user1134991 Jun 08 '20 at 08:56
1

not really: although there is a numpy function [`vectorize`](https://numpy.org/doc/stable/reference/generated/numpy.vectorize.html) it doen't provide performance improvement (see note in the docs) – Stef Jun 08 '20 at 09:00

score 0 · Answer 2 · answered Jun 11 '20 at 21:20

Obviously if you can vectorise you should vectorise, but presumably your example here was just a poor choice of example, and what you're really after is a way to apply arbitrary, non-vectorized functions, elementwise on arrays.

As an example, let's use octave's nchoosek function, which cannot be vectorised, and therefore you'd have to use arrayfun, like so:

a = [2:10].' * [2:10];
arrayfun( @nchoosek, a, 4 )

Python pre-3.8 doesn't have an equivalent function to nchoosek, so I'm using this one:

import operator as op
from functools import reduce

def ncr(n, r):
    r = min(r, n-r)
    numer = reduce(op.mul, range(n, n-r, -1), 1)
    denom = reduce(op.mul, range(1, r+1), 1)
    return numer // denom

Now on with our benchmarks:

import time
import numpy

# same example array as in the octave example
a = numpy.arange(2, 11).reshape(-1,1); a = a @ a.T

# numpy.vectorize
def benchmark1():
  vncr = numpy.vectorize( ncr )
  tic = time.time()
  for i in range(100000):
    b = vncr( a, 4 )
  toc = time.time()
  print( f"Elapsed: {toc-tic}s" )

# list comprehension
def benchmark2():
  a1 = a.reshape(-1).tolist()
  tic = time.time()
  for i in range(100000):
    b = [ ncr( ai, 4 ) for ai in a1 ]
  toc = time.time()
  print( f"Elapsed: {toc-tic}s" )

# explicit for loop with preallocation
def benchmark3():
  b = numpy.empty( a.reshape(-1).shape )
  a1 = a.reshape(-1)
  tic = time.time()
  for i in range(100000):
    for j,k in enumerate(a1):
      b[j] = ncr(k, 4)
  toc = time.time()
  print( f"Elapsed: {toc-tic}s" )

# map function
def benchmark4():
  a1 = a.reshape(-1).tolist()
  reps = [4]*len(a1)
  tic = time.time()
  for i in range(100000):
    b = list( map( ncr, a1, reps ) )
  toc = time.time()
  print( f"Elapsed: {toc-tic}s" )

benchmark1()
benchmark2()
benchmark3()
benchmark4()

Outputs (on my machine):

Elapsed: 19.005178928375244s
Elapsed: 16.108781337738037s
Elapsed: 31.94666314125061s
Elapsed: 14.685683965682983s

arrayfun equovalent for numpy, in terms of computation performance

2 Answers2