-1

I have a function which takes a two element array as input. Now I have large data (shape = (360000,2)) and want to evaluate the function at each point by using numpy.apply_along_axis . One of the answers given in the this thread(numpy np.apply_along_axis function speed up?) says that numpy.apply_along_axis is not for speed. My function is vectorised. How can I improve the evolution time for all my data without using jit/cython.

I will include a sample code of what I'm trying to do exaclty

import numpy as np
import random
def sample(x):
    return np.sin(x[0])*np.cos(x[1])
data = np.random.normal(size=600*600*2)
data = data.reshape(600*600,2)
%timeit np.sum(np.apply_along_axis(sample, 1,data)) #using the apply_along_axis

def loop_way():  # using loop
    result = []
    for i in data:
        result += [sample(i)]
    return np.sum(result)
%timeit loop_way()


output when using np. apply_along_axis: 1 loop, best of 3: 4.06 s per loop

output for loop_way function: 1 loop, best of 3: 2.41 s per loop
Aditya Kurrodu
  • 325
  • 2
  • 9
  • With 3 or more dimensions, `apply_along_axis` is a convenience, easier to write than a double nested loop over the other dimensions. But with only one loop that doesn't matter. Your `loop_way` is missing a step, if you want a fair comparison, a `np.array(result)` conversion. – hpaulj Jun 06 '19 at 03:43
  • @hpaulj yeah. I corrected the function.thanks – Aditya Kurrodu Jun 06 '19 at 03:47

1 Answers1

3

np.sin and * are vectorized operations, so, you can apply them over whole arrays:

np.sin(data[:, 0]) * np.cos(data[:, 1])

data[:, 0] is the first column and data[:, 1] is the second.

Note that this should go really fast :)


Here is a notebook that tests the speed of each method: notebook.

Average run time:

  • Method 1 (using numpy.apply_along_axis): 2.08s
  • Method 2 (loop applying function to rows): 1.14s
  • Method 3 (this answer): 17.3ms
araraonline
  • 1,502
  • 9
  • 14