I'm trying to parallelize my preprocessing function with the use of the multiprocessing python package (which can be found here: https://docs.python.org/3.4/library/multiprocessing.html?highlight=process).
It works fine on my computer (my 4 CPUs are used), however it seems it's not working when I run my code on a google cloud ml-engine job. The job is taking much more time than the sequential equivalent, and cpu utilisation is going down to almost 0% at some point.
Here is my code attempt:
import multiprocessing as mp
pool = mp.Pool(processes=mp.cpu_count())
params = [ some_params_lists]
pool.starmap(fn_to_run_in_parallel, params)
pool.close()
pool.join()
I also tried using multiprocessing.Process()
without any luck.
Configuration of the machine:
ScaleTier = 'CUSTOM'
masterTYpe = 'large_model'