So I have an algorithm I am writing, and the function multiprocess
is supposed to call another function, CreateMatrixMp()
, on as many processes as there are cpus, in parallel. I have never done multiprocessing before, and cannot be certain which one of the below methods is more efficient. The word "efficient" being used in the context of the function CreateMatrixMp()
needing to potentially be called thousands of times.I have read all of the documentation on the python multiprocessing
module, and have come to these two possibilities:
First is using the Pool
class:
def MatrixHelper(self, args):
return self.CreateMatrix(*args)
def Multiprocess(self, sigmaI, sigmaX):
cpus = mp.cpu_count()
print('Number of cpu\'s to process WM: %d' % cpus)
poolCount = cpus*2
args = [(sigmaI, sigmaX, i) for i in range(self.numPixels)]
pool = mp.Pool(processes = poolCount, maxtasksperchild= 2)
tempData = pool.map(self.MatrixHelper, args)
pool.close()
pool.join()
And next is using the Process
class:
def Multiprocess(self, sigmaI, sigmaX):
cpus = mp.cpu_count()
print('Number of cpu\'s to process WM: %d' % cpus)
processes = [mp.Process(target = self.CreateMatrixMp, args = (sigmaI, sigmaX, i,)) for i in range(self.numPixels)]
for p in processes:
p.start()
for p in processes:
p.join()
Pool
seems to be the better choice. I have read that it causes less overhead. And Process
does not consider the number of cpus on the machine. The only problem is that using Pool
in this manner gives me error after error, and whenever I fix one, there is a new one underneath it. Process
seems easier to implement, and for all I know it may be the better choice. What does your experience tell you?
If Pool
should be used, then am I right in choosing map()
? It would be preferred that order is maintained. I have tempData = pool.map(...)
because the map
function is supposed to return a list of the results of every process. I am not sure how Process
handles its returned data.