This question has been asked and solved a few times recently but I have quite a specific example...
I have a multiprocessing function that was working absolutely fine in complete isolation yesterday (in an interactive notebook), however, I decided to parameterise so I can call it as part of a larger pipeline & for abstraction/cleaner notebook and now it's only using a single thread instead of 6.
import pandas as pd
import multiprocessing as mp
from multiprocessing import get_context
mp.set_start_method('forkserver')
def multiprocess_function(func, iterator, input_data):
result_list = []
def append_result(result):
result_list.append(result)
with get_context('fork').Pool(processes=6) as pool:
for i in iterator:
pool.apply_async(func, args = (i, input_data), callback = append_result)
pool.close()
pool.join()
return result_list
multiprocess_function(count_live, run_weeks, base_df)
My previous version of the code executed differently, instead of a return / call I was using the following at the bottom of the function (which doesn't work at all now I've parameterised - even with the args assigned)
if __name__ == '__main__':
multiprocess_function()
The function executes fine, just only operates across one thread as per the output in top.
Apologies if this is something incredibly simple - I'm not a programmer, I'm an analyst :)
edit: everything works absolutely fine if I include the if__name__ =='main': etc at the bottom of the function and execute the cell, however, when I do this I have to remove the parameters - maybe just something to do with scoping. If I execute by calling the function, whether it is parameterised or not, it only operates on a single thread.