Solve equations in parallel with python

Question

Hello to the community,

I have started to learn python for about 2 months, and currently would like to see if it is possible to solve ODE's in parallel in python.

I have this code for solving in serial the Lotka-Volterra equations.

import numpy as np
from scipy import integrate
import pandas as pd

def derivative(X, t, alpha, beta, delta, gamma):
    x, y = X
    dotx = x * (alpha - beta * y)
    doty = y * (-delta + gamma * x)
    return np.array([dotx, doty])


delta = 1.
gamma = 1.
x0 = 4.
y0 = 2.


Nt = 1000
tmax = 30.
t = np.linspace(0.,tmax, Nt)
X0 = [x0, y0]


betas = np.arange(0.9, 1.4, 0.1)
alphas = np.arange(0.9, 1.4, 0.1)


df = pd.DataFrame({"time": t})

for beta, i in zip(betas, range(len(betas))):
    for alpha, j in zip(alphas, range(len(alphas))):
        print("solving for: \n alpha: %s \n beta: %s \n" %(alpha, beta))
        res = integrate.odeint(derivative, X0, t, args = (alpha,beta, delta, gamma))
        x, y = res.T
        df = pd.concat([df, pd.DataFrame({str(alpha)+'+'+ str(beta)+'_x' : x})], axis=1)
        df = pd.concat([df, pd.DataFrame({str(alpha)+'+'+ str(beta)+'_y' : y})], axis=1)

I have a parameter range of alphas and betas. Currently, they are small so this runs without a problem. But if I have vectors of one order of magnitude higher betas = np.arange(0.9, 1.4, 0.01) the code will take quite some time to complete. I would like to know if it is possible to parallelize this. Split the alphas and betas vectors to different processors, solve everything and put it into a pandas dataframe to have create a .csv file.

Best Regards

The easiest way to do that is using `map()` in the multiprocessing module. A simple example can be found here: https://docs.python.org/3/library/multiprocessing.html#introduction — Nick ODell, Apr 22 '23 at 15:57

Nick ODell · Answer 1 · 2023-04-22T20:46:52.293

I have started to learn python for about 2 months, and currently would like to see if it is possible to solve ODE's in parallel in python.

Before you do this, it's useful to do a sanity check and check that the thing you're optimizing is the thing which is most expensive.

A tool I find useful for this is line_profiler, which can check how much time is spent executing each line of a function.

Here's what it says about this program. (I am using np.arange(0.9, 1.4, 0.01) for both betas and alphas.)

Timer unit: 1e-09 s

Total time: 278.417 s
File: /tmp/ipykernel_10112/2335010708.py
Function: solve_orig at line 28

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    28                                           def solve_orig():
    29         1     379527.0 379527.0      0.0      df = pd.DataFrame({"time": t})
    30                                           
    31        50     125737.0   2514.7      0.0      for beta, i in zip(betas, range(len(betas))):
    32      2500    9630151.0   3852.1      0.0          for alpha, j in zip(alphas, range(len(alphas))):
    33                                           #             print("solving for: \n alpha: %s \n beta: %s \n" %(alpha, beta))
    34      2500 12117188243.0 4846875.3      4.4              res = integrate.odeint(derivative, X0, t, args = (alpha,beta, delta, gamma))
    35      2500    9580045.0   3832.0      0.0              x, y = res.T
    36      2500 133427971683.0 53371188.7     47.9              df = pd.concat([df, pd.DataFrame({str(alpha)+'+'+ str(beta)+'_x' : x})], axis=1)
    37      2500 132852203476.0 53140881.4     47.7              df = pd.concat([df, pd.DataFrame({str(alpha)+'+'+ str(beta)+'_y' : y})], axis=1)

The call to odeint() costs 5% of your total run time. Even if you optimized it, you'd gain at most 5% in speed. It is actually spending 95% of its time appending columns to your dataframe.

But why? Here's a post which explains what's going wrong.

Never call DataFrame.append or pd.concat inside a for-loop. It leads to quadratic copying.

pd.concat returns a new DataFrame. Space has to be allocated for the new DataFrame, and data from the old DataFrames have to be copied into the new DataFrame.

More information: Why does concatenation of DataFrames get exponentially slower?

To avoid this, you can build up a dictionary of dataframe columns, and convert to a dataframe at the very end.

def solve():
    columns = {"time": t}
    for beta, i in zip(betas, range(len(betas))):
        for alpha, j in zip(alphas, range(len(alphas))):
    #         print("solving for: \n alpha: %s \n beta: %s \n" %(alpha, beta))
            res = integrate.odeint(derivative, X0, t, args = (alpha,beta, delta, gamma))
            x, y = res.T
            columns[str(alpha)+'+'+ str(beta)+'_x'] = x
            columns[str(alpha)+'+'+ str(beta)+'_y'] = y
    df = pd.DataFrame(columns)
    return df

Re-benchmarking this, I find that it is 24x faster, and now spends 98% of its time inside odeint().

This is very good advice! But if I have a machine that has 12 cores, and I decided to scale this further shouldn't I try to make use of them? Split the array's per number of processors. Since the equations are not related they can be solved in each processor and the results combined, somehow. — R. Smith, Apr 23 '23 at 09:04

score -1 · Answer 2 · answered Apr 22 '23 at 15:54

-1

Check out Numba and/or Multiprocessing.

Note: if you end up using Numba, you should try and avoid using Python loops and instead use Numpy's built-in functions instead.

answered Apr 22 '23 at 15:54

SimonUnderwood

469
3
12

Solve equations in parallel with python

2 Answers2