Say I have a function that provides different results for the same input and needs to be performed multiple times for the same input to obtain mean (I'll sketch a trivial example, but in reality the source of randomness is train_test_split
from sklearn.model_selection
if that matters)
define f(a,b):
output=[]
for i in range(0,b):
output[i] = np.mean(np.random.rand(a,))
return np.mean(output)
The arguments for this function are defined inside another function like so (again, a trivial example, please don't mind if these are not efficient/pythonistic):
define g(c,d):
a = c
b = c*d
result=f(a,b)
return(result)
Instead of using a for
loop, I want to use multiprocessing
to speed up the execution time. I found that neither pool.apply
nor pool.startmap
do the trick (execution time goes up), only pool.map
works. However, it can only take one argument (in this case - the number of iterations). I tried redefining f
as follows:
define f(number_of_iterations):
output=np.mean(np.random.rand(a,))
return output
And then use pool.map
as follows:
import multiprocessing as mp
define g(c,d):
temp=[]
a = c
b = c*d
pool = mp.Pool(mp.cpu_count())
temp = pool.map(f, [number_of_iterations for number_of_iterations in b])
pool.close()
result=np.mean(temp)
return(result)
Basically, a convoluted workaround to make f
a one-argument function. The hope was that f
would still pick up argument a
, however, executing g
results in an error about a
not being defined.
Is there any way to make pool.map work in this context?