2

I am trying to optimize a function that is relatively expensive to evaluate. The function operates across a series of data points, and can be evaluated in parallel. Each data point evaluation requires access to global data, so I am using ctype and multiprocessing Arrays to share the data between processes. I am using multiprocessing Pool.map() to evaluate across the dataset in parallel.

I am using scikit-optimize Bayesian Gaussian Process minimizer to optimize the function based on an input. This input corresponds to the creation of a dataset of a predetermined size.

I have a weird problem where the program hangs if I run the GP minimizer for more than 10 calls but ONLY IF the size of the dataset I am computing the parallel part of the function on is large. Otherwise, I can run the optimizer for up to 100 calls with no problem.

The basic scaffold is this:

'''

from multiprocessing import Pool, Array
import ctypes

def func(x):

    size = 1000
    dataset = np.ctypeslib.as_array(Array(ctypes.c_double, size).get_obj())

    # I know this looks weird, but is actually an array input
    x_shared = np.ctypeslib.asarray(Array(ctypes.c_double, 1).get_obj())

    # create the data based on the input value
    pool = Pool()
    pool.map(create_data, range(size))
    pool.close()
    pool.join()

    # evaluate the data 
    pool = Pool()
    pool.map(evaluate_data, range(size))
    pool.join()
    pool.close()

    return np.mean(dataset)

if __name__ == '__main__':
    gp_minimize(func, ncalls = 10)

I realize the above code doesn't necessarily make sense with the double Pools, but it's necessary for my actual program which is too large to post.

When I interrupt the blocked program, I get the following error:

'''

File "ig_func.py", line 260, in opt_func
    pool2.map(compute_ig, range(nlpts))
  File "/opt/anaconda3/envs/obspy/lib/python3.7/multiprocessing/pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/opt/anaconda3/envs/obspy/lib/python3.7/multiprocessing/pool.py", line 651, in get
    self.wait(timeout)
  File "/opt/anaconda3/envs/obspy/lib/python3.7/multiprocessing/pool.py", line 648, in wait
    self._event.wait(timeout)
  File "/opt/anaconda3/envs/obspy/lib/python3.7/threading.py", line 552, in wait
    signaled = self._cond.wait(timeout)
  File "/opt/anaconda3/envs/obspy/lib/python3.7/threading.py", line 296, in wait
    waiter.acquire()

'''

It appears the program gets stuck when the map is waiting to acquire resources. Again, it doesn't incur this issue when the size of the dataset I'm operating on is small. Any ideas?

bubster
  • 21
  • 1

0 Answers0