0

Say I have a function that provides different results for the same input and needs to be performed multiple times for the same input to obtain mean (I'll sketch a trivial example, but in reality the source of randomness is train_test_split from sklearn.model_selection if that matters)

define f(a,b):
    output=[]
    for i in range(0,b):
        output[i] = np.mean(np.random.rand(a,))
    return np.mean(output)

The arguments for this function are defined inside another function like so (again, a trivial example, please don't mind if these are not efficient/pythonistic):

define g(c,d):
    a = c
    b = c*d
    result=f(a,b)
    return(result)

Instead of using a for loop, I want to use multiprocessing to speed up the execution time. I found that neither pool.apply nor pool.startmap do the trick (execution time goes up), only pool.map works. However, it can only take one argument (in this case - the number of iterations). I tried redefining f as follows:

define f(number_of_iterations):
    output=np.mean(np.random.rand(a,))
    return output

And then use pool.map as follows:

import multiprocessing as mp
define g(c,d):
    temp=[]
    a = c
    b = c*d
    pool = mp.Pool(mp.cpu_count())
    temp = pool.map(f, [number_of_iterations for number_of_iterations in b])
    pool.close()
    result=np.mean(temp)
    return(result)

Basically, a convoluted workaround to make f a one-argument function. The hope was that f would still pick up argument a, however, executing g results in an error about a not being defined.

Is there any way to make pool.map work in this context?

ohlr
  • 1,839
  • 1
  • 13
  • 29

1 Answers1

0

I think functool.partial solves your issue. Here is a implementation: https://stackoverflow.com/a/25553970/9177173 Here the documentation: https://docs.python.org/3.7/library/functools.html#functools.partial

ohlr
  • 1,839
  • 1
  • 13
  • 29