0

I am writing an app to generate images in python 3.9. I'm trying to use multiprocessing to speed up the process. I have a class:

from multiprocessing import Pool

class FileWriter:
   attr1
   attr2

   def __init__(self, a1, a2):
      self.attr1 = a1
      self.attr2 = a2
    
   def write_files(self, param1, param2):
      # uses self.attr1 and self.attr2

   def start_processes():
      ...
      with Pool(processes=np.min(4,initial=multiprocessing.cpu_count())) as pool:
         for y in range(0, 3):
            for x in range(0, 3):
               pool.apply_async(self.write_files, (x, y))

The pool.apply_async does not run when I run this program normally. It just skips over the write_files method. However if I debug and step over the pool.apply_async, write_files executes (but only when I step over the function). I originally thought this was a problem with pickling as seen here. I am using an instance method with self inside the apply_async function. I tried both using dill and pathos but neither solves the problem.

How do I get the write_files method run normaly?

afriedman111
  • 1,925
  • 4
  • 25
  • 42
  • As an aside, `pool` operations need to serialize the data being passed to subprocesses. You are usually better off writing worker functions that minimize what's being passed around intead of using instance methods, which may include more data than you want. – tdelaney Sep 28 '22 at 18:00

1 Answers1

1

You started a bunch of async operations, but didn't wait for them to complete. When the last async job is posted, you exit the with, terminating the pool. When you single step, you give a chance for the processes to complete. Since you aren't doing any other work in this example, map is likely the better choice.

  with Pool(processes=np.min(4,initial=multiprocessing.cpu_count())) as pool:
      pool.map(self.write_files, ((x,y) for x in range(3) for y in range(3)))
tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • This worked but I had issues using pool.map with multiple parameters. Instead I used `pool.starmap(self.write_files, [(x,y) for x in range(3) for y in range(3)])`. Does `map`/`startmap` execute everything in paralell and block execution until it is finished? – afriedman111 Sep 28 '22 at 19:53
  • @afriedman111 - they are almost the same. `map` passes a single parameter (the tuple), and the worker need to expand that tuple into the arguments. starmap will expand the tuple for you ( it does `*args`, hence "star" map). Otherwise they are the same. – tdelaney Sep 28 '22 at 19:55