Pandas pandarallel parallel_aply

Asked Dec 13 '22 at 11:36

Active Dec 13 '22 at 11:36

Viewed 337 times

Here is a simple program that works in parallel. But in has an issue when I want to use a previous result to apply.

import pandas as pd
import numpy as np
from pandarallel import pandarallel

pandarallel.initialize(nb_workers=8)  # nb_workers=NUMBER_OF_CPU_CORES


def dummy_fit(x, y_hint=0.5):
    # Imagine quite a complicated code here
    # y_hint is a previous fit. When it is not given, use default 
    y = (x.mean() + y_hint) / 2
    return y


df = pd.DataFrame(np.random.random((10, 3)), columns=list("ABC"))
print("data:\n", df)
result = df.parallel_apply(dummy_fit, axis=1)
print(result)

We can use a global variable, but it is only one (we have more threads)

How to make it work in parallel?

asked Dec 13 '22 at 11:36

rowiwe5353

Pandas pandarallel parallel_aply

0 Answers0