Append data in multiprocessing of ODE's with python

Question

I would like to solve the Lotka-Volterra equations for a range of test parameters.

I would also like to append the results to a dictionary to latter create a pandas dataFrame and output the results as a .csv.

I have the following code:

import numpy as np
from scipy import integrate
import pandas as pd
import itertools
import multiprocessing as mtp


def derivative(X, t, alpha, beta, delta, gamma):
    x, y = X
    dotx = x * (alpha - beta * y)
    doty = y * (-delta + gamma * x)
    return np.array([dotx, doty])


delta = 1.
gamma = 1.
x0 = 4.
y0 = 2.


Nt = 1000
tmax = 30.
t = np.linspace(0.,tmax, Nt)
X0 = [x0, y0]


betas = np.arange(0.9, 1.4, 0.01)
alphas = np.arange(0.9, 1.4, 0.01)

paramlist = list(itertools.product(alphas, betas))

columns = {"time": t}

def function(params):
    alpha, beta = params
    res = integrate.odeint(derivative, X0, t, args = (alpha, beta, delta, gamma))
    x, y = res.T
    
    print(f"alpha: {alpha}, beta: {beta}")

    columns[str(alpha)+'+'+ str(beta)+'_x'] = x
    columns[str(alpha)+'+'+ str(beta)+'_y'] = y
    


nProcess = 4

with mtp.Pool(processes= nProcess) as pool:
        pool.map(function, paramlist)
        
df = pd.DataFrame(columns)
df.to_csv("test.csv", sep='\t')

However, I am not able to get the results in the columns dictionary. How can I get the results. Additionlly, is there anyway to sort the appends based on e.g., beta? Or will this kill the parallel efficiency?

Best Regards

You canno't do, what you're doing that way. Because multiprocessing will create another contexte and namespaces when it's environnement will be created (not the same memory space execution). If you want to exchange data between processes, you need to use a Queue from the multiprocessing lib or use the Manager library. You can see more info about this topic here: https://stackoverflow.com/questions/35157367/how-to-share-data-between-python-processes — Adrien Derouene, Apr 24 '23 at 16:11

score 0 · Answer 1 · answered Apr 24 '23 at 17:27

Use a ThreadPool instead:

from multiprocessing.pool import ThreadPool

nProcess = 4

with ThreadPool(processes=nProcess) as pool:
        results = pool.map(function, paramlist)

# add the results to `columns`
for res in results:
    # each `res` is a dict
    for k, v in res.items():
        columns[k] = v

# create the dataframe and write to cvs
df = pd.DataFrame(columns)
df.to_csv("test.csv", sep='\t')

I've also modified the function() function for two reason: to make it return a value, and to handle an eventual error in ode computation:

def function(params):
    alpha, beta = params

    try:
        res = integrate.odeint(derivative, X0, t, args = (alpha, beta, delta, gamma))
        x, y = res.T
    except SystemError:
        # fallback values
        x = -999.0
        y = -999.0
    
    print(f"alpha: {alpha}, beta: {beta}")

    return {str(alpha)+'+'+ str(beta)+'_x': x,
            str(alpha)+'+'+ str(beta)+'_y': y}

As the name suggests the ThreadPool instatiates threds instead of processes. A thread is lightweight and you can have many of them, also there is no need to use mp.Queue or similar for inter-process communication since the threads share the same memory. A ThreadPool is also fast, making good use of multi-thread CPUs.

Sample df output:

time  0.9+0.9_x  0.9+0.9_y  0.9+1.0_x  0.9+1.0_y  0.9+1.1_x  0.9+1.1_y  \
0  0.000000   4.000000   2.000000   4.000000   2.000000   4.000000   2.000000   
1  0.303030   2.327953   3.932511   2.152261   3.807377   1.992985   3.693019   
2  0.606061   0.914347   4.627565   0.797452   4.273218   0.699206   3.977724   
3  0.909091   0.360698   4.084337   0.309963   3.681627   0.267664   3.358718   
4  1.212121   0.174083   3.255389   0.150231   2.903337   0.129902   2.625089

Thanks for the input! Is `ThreadPool` faster then `pool` in your machine? Mine is far slower. Pool: 7.1728s, `ThreadPool`: 11.7716s. Also the IO portion is a very big bottleneck.. — R. Smith, Apr 24 '23 at 18:08
On my machine (still with `nProcess=4`) I got 7.8s for `ThreadPool` and 6.3s for creating and writing the `df`. I noticed that the final csv is around `90MB`, so maybe you can create and write a csv inside `function`: it would be one per (x,y) pair. — Luca Anzalone, Apr 25 '23 at 09:06
That was part of what I hoping to achieve. The IO portion is becoming the bottleneck and I do not know how to improve it :(. Additionally, by definition wont run a `threadPool` be slower than `Pool`? — R. Smith, Apr 25 '23 at 09:35
I improved the IO portion by saving as `pickled` instead of `csv` — R. Smith, Apr 25 '23 at 10:23
According to [this answer](https://stackoverflow.com/a/70701029/21113996) `ThreadPool` is still subject to GIL, but can be a better choice than `Pool` either if the task is I/O-bound or there is code that releases the GIL — Luca Anzalone, Apr 25 '23 at 12:48

Append data in multiprocessing of ODE's with python

1 Answers1