I am trying to parallelize some calculations with the use of the multiprocessing
module.
How can be sure that every process that is spawned by multiprocessing.Pool.map_async
is running on a different (previously created) folder?
The problem is that each process calls some third parts library that wrote temp files to disk, and if you run many of those in the same folder, you mess up one with the other.
Additionally, I can't create a new folder for every function call made by map_async, but rather, I would like to create as little as possible folders (ie, one per each process).
The code would be similar to this:
import multiprocessing,os,shutil
processes=16
#starting pool
pool=multiprocessing.Pool(processes)
#The asked dark-magic here?
devshm='/dev/shm/'
#Creating as many folders as necessary
for p in range(16):
os.mkdir(devshm+str(p)+'/')
shutil.copy(some_files,p)
def example_function(i):
print os.getcwd()
return i*i
result=pool.map_async(example_function,range(1000))
So that at any time, every call of example_function is executed on a different folder.
I know that a solution might be to use subprocess to spawn the different processes, but I would like to stick to multiprocessing (I would need to pickle some objects, write to disk,read, unpickle for every spawned subprocess, rather than passing the object itself through the function call(using functools.partial) .
PS.
This question is somehow similar, but that solution doesn't guarantee that every function call is taking place on a different folder, which indeed is my goal.