0

I want to call the buildWorld method of each of several world objects. This is a computationally expensive and time-consuming task, sometimes taking several hours depending on the settings you give each world.

After initially running into the issue that pickle couldn't serialize my object methods (described here and in several questions like here and here), I tried using pathos. The multiprocessing did not improve performance, so I did a time check on just pure serialization of my objects to see if this was the source of the slowdown, but it wasn't. Nevertheless, I eventually took an approach where I created a world within the child process rather than pass it in, then saved it as a file that I could re-load in my parent process to avoid having to serialize anything on a return (and also on child process start, I assume?). I came across several similar questions (A & B), and tried a few things (see below), but none of the answers seemed to do the trick for me.

Unfortunately, nothing actually sped up my code. I was seeing that child processes were being created, but if I created 3 subprocesses, for example, my code ran 3x slower, and so took just as much time as if I ran just one process. Any guidance is appreciated!

Method 1 (Pathos ProcessingPool):

import pathos.multiprocessing as mp

def run_world_building(worldNum):
  myWorld = emptyWorld(worldNum) # not expensive
  myWorld.buildWorld()           # very expensive
  myWorld.save()                 # create a file with world info

p = mp.ProcessingPool(3)
p.map(run_world_building, range(0,3))

Method 2 (Multiprocess (no '-ing') with individual Process objects):

import multiprocess as mp

def run_world_building(worldNum):
  myWorld = emptyWorld(worldNum) # not expensive
  myWorld.buildWorld()           # very expensive
  myWorld.save()                 # create a file with world info

processes = []
for i in range(0,3):
  p = mp.Process(target=run_world_building, args=(i,))
  processes.append(p)

# I separated the start and join loops, but
# not sure if that's entirely necessary
for i in range(0, 3):
  processes[i].start()
for i in range(0,3):
  processes[i].join()

Method 3 (Using ThreadPool):

from pathos.pools import ThreadPool

def run_world_building(worldNum):
  myWorld = emptyWorld(worldNum) # not expensive
  myWorld.buildWorld()           # very expensive
  myWorld.save()                 # create a file with world info

p = ThreadPool(3)
p.map(run_world_building, range(0,3))

Pran
  • 831
  • 6
  • 8
  • 1
    What is the cpu utilization without and with multiprocessing ? chances are you were already at 100%, it could also be a memory bottleneck, but that one is slightly harder to pin down. – Ahmed AEK Oct 20 '22 at 06:36
  • 1
    It would also be nice to know what kind of workloads are we talking about, vector math ? Or creation of python objects ? Or linear algebra? Or image processing? Or loading data from disk ? Or gpu work ? Expensive is sort of broad. – Ahmed AEK Oct 20 '22 at 06:46

0 Answers0