Here is a simple program that works in parallel. But in has an issue when I want to use a previous result to apply.
import pandas as pd
import numpy as np
from pandarallel import pandarallel
pandarallel.initialize(nb_workers=8) # nb_workers=NUMBER_OF_CPU_CORES
def dummy_fit(x, y_hint=0.5):
# Imagine quite a complicated code here
# y_hint is a previous fit. When it is not given, use default
y = (x.mean() + y_hint) / 2
return y
df = pd.DataFrame(np.random.random((10, 3)), columns=list("ABC"))
print("data:\n", df)
result = df.parallel_apply(dummy_fit, axis=1)
print(result)
We can use a global variable, but it is only one (we have more threads)
How to make it work in parallel?