8

I am numerically solving for x(t) for a system of first order differential equations. The system is:

dx/dt = y
dy/dt = -x - a*y(x^2 + y^2 -1)

I have implemented the Forward Euler method to solve this problem as follows:

def forward_euler():
    h = 0.01
    num_steps = 10000

    x = np.zeros([num_steps + 1, 2]) # steps, number of solutions
    y = np.zeros([num_steps + 1, 2])
    a = 1.

    x[0, 0] = 10. # initial condition 1st solution
    y[0, 0] = 5.

    x[0, 1] = 0.  # initial condition 2nd solution
    y[0, 1] = 0.0000000001

    for step in xrange(num_steps):
        x[step + 1] = x[step] + h * y[step]
        y[step + 1] = y[step] + h * (-x[step] - a * y[step] * (x[step] ** 2 + y[step] ** 2 - 1))

    return x, y

Now I would like to vectorize the code further and keep x and y in the same array, I have come up with the following solution:

def forward_euler_vector():
    num_steps = 10000
    h = 0.01

    x = np.zeros([num_steps + 1, 2, 2]) # steps, variables, number of solutions
    a = 1.

    x[0, 0, 0] = 10. # initial conditions 1st solution
    x[0, 1, 0] = 5.  

    x[0, 0, 1] = 0.  # initial conditions 2nd solution
    x[0, 1, 1] = 0.0000000001

    def f(x): 
        return np.array([x[1],
                         -x[0] - a * x[1] * (x[0] ** 2 + x[1] ** 2 - 1)])

    for step in xrange(num_steps):
        x[step + 1] = x[step] + h * f(x[step])

    return x

The question: forward_euler_vector() works, but was this to best way to vectorize it? I am asking because the vectorized version runs about 20 ms slower on my laptop:

In [27]: %timeit forward_euler()
1 loops, best of 3: 301 ms per loop

In [65]: %timeit forward_euler_vector()
1 loops, best of 3: 320 ms per loop
  • The "vectorized" version only really vectorizes `h * f(x[step])` or just two operations. The added cost of creating a numpy array offsets any speed gains. Depending on what you are doing you may want to look into [scipy.integrate.ode](http://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.ode.html#scipy.integrate.ode). – Daniel May 15 '14 at 13:23

2 Answers2

4

There is always the trivial autojit solution:

def forward_euler(initial_x, initial_y, num_steps, h):

     x = np.zeros([num_steps + 1, 2]) # steps, number of solutions
     y = np.zeros([num_steps + 1, 2])
     a = 1.

     x[0, 0] = initial_x[0] # initial condition 1st solution
     y[0, 0] = initial_y[0]

     x[0, 1] = initial_x[1]  # initial condition 2nd solution
     y[0, 1] = initial_y[1]

     for step in xrange(int(num_steps)):
         x[step + 1] = x[step] + h * y[step]
         y[step + 1] = y[step] + h * (-x[step] - a * y[step] * (x[step] ** 2 + y[step] ** 2 - 1))

     return x, y

Timings:

from numba import autojit
jit_forward_euler = autojit(forward_euler)

%timeit forward_euler([10,0], [5,0.0000000001], 1E4, 0.01)
1 loops, best of 3: 385 ms per loop

%timeit jit_forward_euler([10,0], [5,0.0000000001], 1E4, 0.01)
100 loops, best of 3: 3.51 ms per loop
Daniel
  • 19,179
  • 7
  • 60
  • 74
  • Do you have any idea why on my machine this does not result in a performance improvement? my `numba.test()` fails only on `test_pow_floats_array`,which is relevant here. I have version `0.12.1` of numba on anaconda, mac. – gg349 May 15 '14 at 14:36
  • 1
    @flebool No idea to be honest. Numba, in my opinion, has a lot to be be done before the 1.0 release. As such I have not invested much time into the inner workings. – Daniel May 15 '14 at 14:41
3

@Ophion comment explains very well what's going on. The call to array() within f(x) introduces some overhead, that kills the benefit of the use of matrix multiplication in the expression h * f(x[step]).

And as he says, you may be interested in having a look at scipy.integrate for a nice set of numerical integrators.

To solve the problem at hand of vectorising your code, you want to avoid recreating the array every time you call f. You would like to initialize the array once, and return it modified at every call. This is similar to what a static variable is in C/C++.

You can achieve this with a mutable default argument, that is interpreted once, at the time of the definition of the function f(x), and that has local scope. Since it has to be mutable, you encapsulate it in a list of a single element:

 def f(x,static_tmp=[empty((2,2))]): 
    static_tmp[0][0]=x[1]
    static_tmp[0][1]=-x[0] - a * x[1] * (x[0] ** 2 + x[1] ** 2 - 1)
    return static_tmp[0]

With this modification to your code, the overhead of array creation disappears, and on my machine I gain a small improvement:

%timeit forward_euler()        #258ms
%timeit forward_euler_vector() #248ms

This means that the gain of optimizing matrix multiplication with numpy is quite small, at least on the problem at hand.

You may want to get rid of the function f straight away as well, doing its operations within the for loop, getting rid of the call overhead. This trick of the default argument can however be applied also with scipy more general time integrators, where you must provide a function f.

EDIT: as pointed out by Jaime, another way to go is to treat static_tmp as an attribute of the function f, and to create it after having declared the function but before calling it:

 def f(x): 
    f.static_tmp[0]=x[1]
    f.static_tmp[1]=-x[0] - a * x[1] * (x[0] ** 2 + x[1] ** 2 - 1)
    return f.static_tmp
 f.static_tmp=empty((2,2))
gg349
  • 21,996
  • 5
  • 54
  • 64
  • Not sure if I like your method better, but I had always seen "static" variables in Python defined as members of the function's class, see e.g. [this](http://stackoverflow.com/questions/279561/what-is-the-python-equivalent-of-static-variables-inside-a-function). – Jaime May 15 '14 at 14:11
  • @Jaime, I think that's another way to go, and thinking about it it looks much cleaner (no need to deal with the first element of the list every time). I'll update the answer with the other option. – gg349 May 15 '14 at 14:18