0

can anyone help me advising why I am getting slowness in Python Multiprocessing pool in windows platform on running below code:

        with Pool(processes=6, initializer=init_pool,
                  initargs=(x, y, z)) as p:
            res = pd.concat(p.imap(process_product, df.values))

It is taking 2-3 minutes in windows and on linux its taking less than a minute. Also, the CPU Utilisation is not going up just stand at 25% max only. Please let me know if the above code is fine to run on windows platform first.

I tried to put the logs in initpool methods as well. Its showing the process spawning as per below:

def init_pool(x, y, z):

print("initialising thread {} and current process {}".format(current_thread().name, current_process()))


global x1, y1, z1


x1 = x
y1 = y
z1 = z
D.L
  • 4,339
  • 5
  • 22
  • 45
  • Does this answer your question? [Multiprocessing slower than serial processing in Windows (but not in Linux)](https://stackoverflow.com/questions/52465237/multiprocessing-slower-than-serial-processing-in-windows-but-not-in-linux) – Cow Mar 24 '22 at 12:06

1 Answers1

1

You are correct, this is specific to windows:

There are basically 3 ways to start a process:

  1. spawn
  2. fork
  3. forkserver

Of the above three, only spawn is available to windows.

spawn does this:

The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary to run the process object’s run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver.

This (fresh python interpreter process) is effectively what causes the overhead that you have noticed.

The comment provides a much more complete explanation. And you can also look at the docs here for a better understanding: https://docs.python.org/3/library/multiprocessing.html?highlight=process#the-spawn-and-forkserver-start-methods

D.L
  • 4,339
  • 5
  • 22
  • 45