multiprocessing Pool not changing processing speed?

Question

I've created an image approximating genetic algorithm using python 3 and opencv. What it does is, it creates a population of individuals that draw random colored,sized, and opacity circles onto a blank image. The fittest eventually saturate the population after several hundred generations.

I tried to implement multiprocessing because rendering the images takes time correlating to population size and circle size, as well as target image size (important for detail fineness)

What I did is I used multiprocessing and Pool, with the array of individual objects as the iterable and mapped out only the fitness and id. In effect, in the main process none of the individuals have their own canvas, whereas in the multiprocess processes, each individuals render out their canvas and calculate fitness/difference.

However, it seems using multiprocessing makes the whole program slower? In fact, the rendering process seems to be taking the same amount of speed compared to serialized processing, but is taking slower because of the multiprocessing aspect.

class PopulationCircle:
    def renderPop(self, individual):
        individual.render()
    return [individual.index, individual.fitness]
class IndividualCircle:
    def render(self):
        self.genes.sort(key=lambda x: x[-1], reverse=False)
        self.canvas = np.zeros((self.height,self.width, 4), np.uint8)
        for i in range(self.maxCount):
            overlay=self.canvas.copy()
            cv2.circle(overlay, (self.genes[i][0], self.genes[i][1]), self.genes[i][2], (self.genes[i][3],self.genes[i][4],self.genes[i][5]), -1, lineType=cv2.LINE_AA)
            self.canvas = cv2.addWeighted(overlay, self.genes[i][6], self.canvas, 1-self.genes[i][6], 0)

        diff = np.absolute(np.array(self.target)- np.array(self.canvas))

        diffSum = np.sum(diff)

        self.fitness = diffSum

def evolution(mainPop, generationLimit):
    p = mp.Pool()

    for i in range(int(generationLimit)):
        start_time = time.time()
        result =[]
        print(f"""
-----------------------------------------
Current Generation: {mainPop.generation}
Initial Score: {mainPop.score}
-----------------------------------------
        """)

        #Multiprocessing used for rendering out canvas since it takes time.

        result = p.map(mainPop.renderPop, mainPop.population)

        #returns [individual.index, individual.fitness]; results is a list of list
        result.sort(key = lambda x: x[0], reverse=False)

        #Once multiprocessing is done, we only receive fitness value and index. 
        for k in mainPop.population:
            k.fitness = result[k.index][1]
        mainPop.population.sort(key = lambda x: x.fitness, reverse = True)
        if mainPop.generation == 0:
            mainPop.score = mainPop.population[-1].fitness

        """
        Things to note:
            In main process, none of the individuals have a canvas since the rendering
            is done on a different process tree.
            The only thing that changes in this main process is the individual's 
            fitness.

            After calling .renderHD and .renderLD, the fittest member will have a canvas
            drawn in this process. 
        """

        end_time = time.time() - start_time
        print(f"Time taken: {end_time}")
        if i%50==0:
            mainPop.population[0].renderHD()
            cv2.imwrite( f"../output/generationsPoly/generation{i}.jpg", mainPop.population[0].canvasHD)

        if i%10==0:
            mainPop.population[0].renderLD()
            cv2.imwrite( f"../output/allGenPoly/image{i}.jpg", mainPop.population[0].canvas)

        mainPop.toJSON()
        mainPop.breed()



    p.close()
    p.join()

if __name__ == "__main__":
        #Creates Population object
        #init generates self.population array which is an array of IndividualCircle objects that contain DNA and render methods
    pop = PopulationCircle(targetDIR, maxPop, circleAmount, mutationRate, mutationAmount, cutOff)
    #Starts loop
    evolution(pop, generations)

if I use 600 population with 800 circles, serial took: 11siteration avg. multiprocess: 18s/iteration avg.

I'm very new to multiprocessing so any help would be appreciated.

Your code is very complicated for a problem which, likely, is not. It would really help to have a simpler code where one does not have to scroll at each function or class call because in the end, one loses track of the execution flow. I agree with classes and functions making a code neater, but once the whole code works as expected. — Patol75, May 07 '19 at 07:04
I just modulized classes into Population and Individual. I just had to mash in the relevant methods and functions into one for this post, so I guess it seems more complicated than it really is — Ariki, May 07 '19 at 08:12
There is overhead in starting processes and transferring data between them. — Mark Tolonen, May 07 '19 at 13:57

Lukasz Tracewski · Answer 1 · 2019-05-07T14:01:55.207

The reason it's happening is that opencv internally spawns a lot of threads. When you fork from the main and run a number of processes, each of these processes will create separate bunch of opencv threads, resulting in a small avalanche.The problem here is that they will end up syncing and waiting for a lock release, something you can easily check by profiling your code with cProfile.

The problem is described in joblib docs. That's also likely your solution: switch to joblib. I have had a similar problem in the past, you will find it in this SO post.

[EDIT] Extra piece of evidence and solution here. In short, according to that post, it's a known problem, but since opencv releases GIL, it could be possible to run multithreading instead of multiprocessing and therefore reduce the overhead.

Thanks, I'll look into implementing joblib. – Ariki May 08 '19 at 14:22 — Ariki, May 08 '19 at 14:22

multiprocessing Pool not changing processing speed?

1 Answers1