I have an alogirithm that I am trying to parallelize, because of very long run times in serial. However, the function that needs to be parallelized is inside a class. multiprocessing.Pool
seems to be the best and fastest way to do this, but there is a problem. It's target function can not be a function of an object instance. Meaning this; you declare a Pool
in the following way:
import multiprocessing as mp
cpus = mp.cpu_count()
poolCount = cpus*2
pool = mp.Pool(processes = poolCount, maxtasksperchild = 2)
And then actually use it as so:
pool.map(self.TargetFunction, args)
But this throws an error, because object instances cannot be pickled, as the Pool
function does to pass information to all of its child processes. But I have to use self.TargetFunction
So I had an idea, I would create a new Python file named parallel
and simply write a couple of functions without putting them in a class, and call those functions from within my original class (of whose function I want to parallelize)
So I tried this:
import multiprocessing as mp
def MatrixHelper(args):
WM = args[0][0]
print(WM.CreateMatrixMp(*args))
return WM.CreateMatrixMp(*args)
def Start(sigmaI, sigmaX, numPixels, WM):
cpus = mp.cpu_count()
poolCount = cpus * 2
args = [(WM, sigmaI, sigmaX, i) for i in range(numPixels)]
print('Number of cpu\'s to process WM:%d'%cpus)
pool = mp.Pool(processes = poolCount, maxtasksperchild = 2)
tempData = pool.map(MatrixHelper, args)
return tempData
These functions are not part of a class, using MatrixHelper
in Pool
s map
function works fine. But I realized while doing this that it was no way out. The function in need of parallelization (CreateMatrixMp
) expects an object to be passed to it (it is declared as def CreateMatrixMp(self, sigmaI, sigmaX, i)
)
Since it is not being called from within its class, it doesn't get a self
passed to it. To solve this, I passed the Start
funtion the calling object itself. As in, I say parallel.Start(sigmaI, sigmaX, self.numPixels, self)
. The object self
then becomes WM
so that I will be able to finally call the desired function as WM.CreateMatrixMp()
.
I'm sure that that is a very sloppy way of coding, but I just wanted to see if it would work. But nope, pickling error again, the map
function cannot handle any objects instances at all.
So my question is, why is it designed this way? It seems useless, it seems to be completely disfunctional in any program that uses classes at all.
I tried using Process
rather than Pool
, but this requires the array that I am ultimately writing to to be shared, which requires processes waiting for eachother. If I don't want it to be shared, then I have each process write its own smaller array, and do one big write at the end. But both of these result in slower run times than when I was doing this serially! Pythons builtin multiprocessing
seems absolutely useless!
Can someone please give me some guidance as to how to actually save time with multiprocessing, in the context of my tagret function being inside a class? I have read on posts here to use pathos.multiprocessing
instead, but I am on Windows, and am working on this project with multiple people who all have different set ups. Having everyone try to install it would be inconveinient.