How can I parallelize computations and store the results into a common dataframe? My function mutates an external dataframe df by inserting values.
import multiprocessing as mp
import numpy as np
import pandas as pd
def f(u): df.loc[u] = u**2
# unicore computation:
df = pd.DataFrame(np.zeros(10), index=range(10))
[f(u) for u in range(10)]
print(df.T)
# gives correct result
# 0 1 2 3 4 5 6 7 8 9
# 0 0.0 1.0 4.0 9.0 16.0 25.0 36.0 49.0 64.0 81.0
# multicore computation:
df = pd.DataFrame(np.zeros(10), index=range(10))
pool = mp.Pool(2)
pool.map(f, range(10))
pool.close()
print(df.T)
# gives wrong result
# 0 1 2 3 4 5 6 7 8 9
# 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
It is easy to achieve the same task in Julia:
using DataFrames
function f(u) df[u,:v]=u^2 end
df = DataFrame(v=zeros(10));
Threads.@threads for u=1:10 f(u) end
I was hoping a simple solution like this would also be possible in Python.