1

I am not sure if this has been asked before. I couldn't find many relevant results on SO or Google. Anyway, here is what I am trying to do. I have a function that is created at runtime which takes in 4 parameters.

my_func(t, x, p, a)

t is a scalar float.
x is a 1D numpy array of floats.
p and a are dictionaries.

I have a numpy array T and a 2D numpy array X. I want to call the function with each element of T and each column of X and store the result in another numpy array. The function returns a numpy array that forms the column of the solution. My current, naive, very unpythonic implementation is as follows:

for i in range(len(T)):
    _u = my_func(T[i],X[:,i],p,a)
    sol.u[:,i] = _u   # The length of u is unrelated to the other variables

I could've used numexpr if it had allowed custom functions. I have also looked at numpy.vectorize and tried using it like this:

f = np.vectorize(my_func)
f.excluded.add(2)
f.excluded.add(3)
f(T,X,p,a)

This gives me the error "IndexError: invalid index to scalar variable" from somewhere inside the function. I am obviously not getting how numpy broadcasting works.

Is it possible to do this using np.vectorize?
Or is there some way to use np.apply_along_axis?

Another possibility I see is slicing up the arrays beforehand and forming a list of tuples with the right arguments and using them with some form of "map" functionality.

Update: Another solution I found:

f = lambda _t, _X: my_func(_t,_X,p,a)
u = np.array(list(map(f, T, list(X.T)))).T

It seems to work. But is that the most efficient way to do it?

Update 2: Here is a sample function:

def my_func(_t,_X,_p,_aux):
    [x,y,v,lamX,lamY,lamV,tf,] = _X[:7]

    g = _aux['const']['g']

    theta = -2*atan((lamX*v + sqrt(g**2*lamV**2 - 2*g*lamV*lamY*v + lamX**2*v**2 + lamY**2*v**2))/(g*lamV - lamY*v))
    return theta

where an example of _aux would be :

_aux = {'const': {'g':-9.81} }

_t and _p are not used in this case. It expects _X to be a 7-element vector. In this case, the function only returns one value. But there may be cases in which it returns a list. In that case, the returned value forms a column in the output. Hope that clarifies stuff a bit.

Newskooler
  • 3,973
  • 7
  • 46
  • 84
Thomas Antony
  • 544
  • 1
  • 7
  • 17

1 Answers1

2

Your for loop is not unpythonic. Python loves for loops. :) You mean, rather, that it probably does not make optimal use of numpy. It's un-numpy-onic?

I'm not sure that throwing it into blackboxes like numexpr, vectorize, or even apply_along_axis are more numpy-onic. The ideal is to understand the problem and work around/within that structure to make it deal with larger structures.

Let's try a sample function (like you should have given us?):

In [77]: def myfunction(t,x,p,a):
     print(t.shape)
     print(x.shape)
     print(p,a)
     return t*x

In [78]: f=np.vectorize(myfunction)
In [79]: f.excluded.add(2)   # so you can pass p,a 
In [80]: f.excluded.add(3)
In [81]: T=np.arange(5)
In [83]: X=np.ones((4,5))
In [85]: for i in range(T.shape[0]):
    print(myfunction(T[i],X[:,i],{},{}))
   ....:     
()
(4,)
{} {}
[ 0.  0.  0.  0.]
()
(4,)
{} {}
[ 1.  1.  1.  1.]
...

In [87]: f(T,X,{},{})
()
()
{} {}
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-87-b585bb8fb6bc> in <module>()
----> 1 f(T,X,{},{})
...
/usr/lib/python3/dist-packages/numpy/lib/function_base.py in func(*vargs)
   1566                     the_args[_i] = vargs[_n]
   1567                 kwargs.update(zip(names, vargs[len(inds):]))
-> 1568                 return self.pyfunc(*the_args, **kwargs)
   1569 
   1570             vargs = [args[_i] for _i in inds]

<ipython-input-77-a72fc1b2ad5e> in myfunction(t, x, p, a)
      1 def myfunction(t,x,p,a):
----> 2      print(t.shape)
      3      print(x.shape)
      4      print(p,a)
      5      return t*x

AttributeError: 'int' object has no attribute 'shape'

So t is a scalar. I got a different error, but I think it's consistent with the one you got. Does your function use t[0] someplace?


Correction - your t can be a scalar, it is x that's supposed to be a vector. Your criptic This gives me the error "IndexError: invalid index to scalar variable" from somewhere inside the function. didn't help.


This T and X broadcast together just fine, for example T*X works.

The problem with vectorize is that it passes to your function a tuple of scalar values taken from your T and X. It's meant to vectorize a function that takes scalars, not a scalar and vector.

Lets redefine the function so it doesn't care about the shape or type of inputs:

In [101]: def myfunction(t,x):
     print(t)
     print(x)
     return t*x
   .....: 
In [102]: f=np.vectorize(myfunction)
In [103]: f(T,X)
0
1.0
0
1.0
1
...
1.0
Out[103]: 
array([[ 0.,  1.,  2.,  3.,  4.],
       [ 0.,  1.,  2.,  3.,  4.],
       [ 0.,  1.,  2.,  3.,  4.],
       [ 0.,  1.,  2.,  3.,  4.]])

Even though T is 1d and X is 2d, it is passing 2 scalars to the function at each iteration.


But first let me step back a bit.

vectorize can make it easier to 'broadcast' values, but it does not speed things up much. It still iterate through the values, calling your function for each set. It does not change your function at all. It is just a wrapper, just like your for loop.

appply_along/over_axis is also an iteration wrapper. Again no speed up.

Dito for the map expression.

...

For a function that takes a scalar and vector, and can't be changed internally, your for loop is about as good as it gets. Most alternatives will be harder to get right, be more obscure, and probably not much faster.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Maybe I can slice it up into tuples and use something like Python 3.5's Pool.starmap? I am not sure if that will be faster. But it is something. Thank you for your insights. – Thomas Antony Nov 08 '15 at 03:28
  • 1
    Here's one of several `numpy` `multiprocessing` (or `pool`) SO questions: http://stackoverflow.com/questions/25888255/how-to-use-python-multiprocessing-pool-map-to-fill-numpy-array-in-a-for-loop. I contributed an answer, but am not an expert on multiprocessing. – hpaulj Nov 08 '15 at 03:54