I have a function that has 4 nestled for loops in it. The function takes in a dataframe and returns a new dataframe. Currently the function takes about 2 hours to run, I need it to run in around 30 mins...
I've tried multiprocessing using 4 cores but I cant seem to get it to work. I start by creating a list of my input dataframe split into smaller chunks (list_of_df)
all_trips = uncov_df.TRIP_NO.unique()
list_of_df = []
for trip in all_trips:
list_of_df.append(uncov_df[uncov_df.TRIP_NO==trip])
I then tried mapping this list of chunks into my function (transform_df) using 4 pools.
from multiprocessing import Pool
if __name__ == "__main__":
with Pool(4) as p:
df_uncov = list(p.map(transform_df, list_of_df))
df = pd.concat(df_uncov)
When I run the above my code cell freezes and nothing happens. Does anyone know what's going on?