Numeric Integration Python versus Matlab

Question

My python code takes about 6.2 seconds to run. The Matlab code runs in under 0.05 seconds. Why is this and what can I do to speed up the Python code? Is Cython the solution?

Matlab:

function X=Test

nIter=1000000;
Step=.001;
X0=1;

X=zeros(1,nIter+1); X(1)=X0;

tic
for i=1:nIter
    X(i+1)=X(i)+Step*(X(i)^2*cos(i*Step+X(i)));
end
toc

figure(1) plot(0:nIter,X)

Python:

nIter = 1000000
Step = .001
x = np.zeros(1+nIter)
x[0] = 1
start = time.time()
for i in range(1,1+nIter):
      x[i] = x[i-1] + Step*x[i-1]**2*np.cos(Step*(i-1)+x[i-1])
end = time.time()
print(end - start)

I think this is a little different since I am integrating. Though perhaps I am mistaken...Additionally, I eventually want to have multiple arrays interacting with each other at each step. For example, perhaps there is another array, y, and x at time t is not only a function of x at time t-1 but also a function of y at time t-1. — DonkeySaddle, Feb 26 '19 at 03:26
The answers to https://stackoverflow.com/q/2133031/7517724 might help. — Craig, Feb 26 '19 at 04:01
The answers to https://stackoverflow.com/q/30475410/7517724 are also relevant. — Craig, Feb 26 '19 at 04:11
Python/numpy is like an older MATLAB - fast with the `whole-array` operations, slow with the interpreted iterations. Now MATLAB does a lot of jit compiling, allowing you to get away with iterative stuff that used to be too slow. — hpaulj, Feb 26 '19 at 05:45
Unmarked as a duplicate - this is not about mapping a function over an array, this is about performing an iterative computation where each element depends on the last — Eric, Feb 26 '19 at 08:29

score 2 · Answer 1 · answered Feb 26 '19 at 13:28

How to speed up your Python code

Your largest time sink is np.cos which performs several checks on the format of the input. These are relevant and usually negligible for high-dimensional inputs, but for your one-dimensional input, this becomes the bottleneck. The solution to this is to use math.cos, which only accepts one-dimensional numbers as input and thus is faster (though less flexible).

Another time sink is indexing x multiple times. You can speed this up by having one state variable which you update and only writing to x once per iteration.

With all of this, you can speed up things by a factor of roughly ten:

import numpy as np
from math import cos

nIter = 1000000
Step = .001
x = np.zeros(1+nIter)
state = x[0] = 1
for i in range(nIter):
    state += Step*state**2*cos(Step*i+state)
    x[i+1] = state

Now, your main problem is that your truly innermost loop happens completely in Python, i.e., you have a lot of wrapping operations that eat up time. You can avoid this by using uFuncs (e.g., created with SymPy’s ufuncify) and using NumPy’s accumulate:

import numpy as np
from sympy.utilities.autowrap import ufuncify
from sympy.abc import t,y
from sympy import cos

nIter = 1000000
Step = 0.001
state = x[0] = 1
f = ufuncify([y,t],y+Step*y**2*cos(t+y))

times = np.arange(0,nIter*Step,Step)
times[0] = 1
x = f.accumulate(times)

This runs practically within an instant.

… and why that’s not what you should worry about

If your exact code (and only that) is what you care about, then you shouldn’t worry about runtime anyway, because it’s very short either way. If on the other hand, you use this to gauge efficiency for problems with a considerable runtime, your example will fail because it considers only one initial condition and is a very simple dynamics.

Moreover, you are using the Euler method, which is either not very efficient or robust, depending on your step size. The latter (Step) is absurdly low in your case, yielding much more data than you probably need: With a step size of 1, You can see what’s going on just fine.

If you want a robust integration in such cases, it’s almost always best to use a modern adaptive integrator, that can adjust its step size itself, e.g., here is a solution to your problem using a native Python integrator:

from math import cos
import numpy as np
from scipy.integrate import solve_ivp

T = 1000
dt = 0.001

x = solve_ivp(
        lambda t,state: state**2*cos(t+state),
        t_span = (0,T),
        t_eval = np.arange(0,T,dt),
        y0 = [1],
        rtol = 1e-5
    ).y

This automatically adjusts the step size to something higher, depending on the error tolerance rtol. It still returns the same amount of output data, but that’s via interpolation of the solution. It runs in 0.3 s for me.

How to speed up things in a scalable manner

If you still need to speed up something like this, chances are that your derivative (f) is considerably more complex than in your example and thus it is the bottleneck. Depending on your problem, you may be able to vectorise its calcultion (using NumPy or similar).

If you can’t vectorise, I wrote a module that specifically focusses on this by hard-coding your derivative under the hood. Here is your example in with a sampling step of 1.

import numpy as np
from jitcode import jitcode,y,t
from symengine import cos

T = 1000
dt = 1

ODE = jitcode([y(0)**2*cos(t+y(0))])
ODE.set_initial_value([1])
ODE.set_integrator("dop853")
x = np.hstack([ODE.integrate(t) for t in np.arange(0,T,dt)])

This runs again within an instant. While this may not be a relevant speed boost here, this is scalable to huge systems.

This is also awesome, will take me some time to fully understand everything you've done here, but am determined to get to the bottom of it all. Many thanks! - DS — DonkeySaddle, Feb 26 '19 at 19:36

score 0 · Answer 2 · answered Feb 26 '19 at 10:54

The difference is jit-compilation, which Matlab uses per default. Let's try your example with Numba(a Python jit-compiler)

Code

import numba as nb
import numpy as np
import time

nIter = 1000000
Step = .001

@nb.njit()
def integrate(nIter,Step):
  x = np.zeros(1+nIter)
  x[0] = 1
  for i in range(1,1+nIter):
    x[i] = x[i-1] + Step*x[i-1]**2*np.cos(Step*(i-1)+x[i-1])
  return x

#Avoid measuring the compilation time,
#this would be also recommendable for Matlab to have a fair comparison
res=integrate(nIter,Step)

start = time.time()
for i in range(100):
  res=integrate(nIter,Step)

end=time.time()
print((end - start)/100)

This results in 0.022s runtime per call.

I just want to add that I used numba's njit() for a more complicated system of differential equations, ones take took on average about 1.6 seconds to integrate, and the run time is now .002 seconds! Thanks! -DS — DonkeySaddle, Feb 26 '19 at 22:51

Numeric Integration Python versus Matlab

2 Answers2

How to speed up your Python code

… and why that’s not what you should worry about

How to speed up things in a scalable manner