Mulpiprocessing.Pool runs slow

Question

I'm trying to get advantages of multi-processing in python, so did some tests and found multi-processing code runs much slower than plain one. What I do wrong???

Here is the test script:

import numpy as np
from datetime import datetime
from multiprocessing import Pool

def some_func(argv):
    x = argv[0]
    y = argv[1]
    return np.sum(x * y)

def other_func(argv):
    x = argv[0]
    y = argv[1]

    f1 = np.fft.rfft(x)
    f2 = np.fft.rfft(y)
    CC = np.fft.irfft(f1 * np.conj(f2))
    return CC

N = 20000
X = np.random.randint(0, 10, size=(N, N))
Y = np.random.randint(0, 10, size=(N, N))

output_check = np.zeros(N)
D1 = datetime.now()
for k in range(len(X)):
    output_check[k] = np.max(some_func((X[k], Y[k])))
print('Plain: ', datetime.now()-D1)

output = np.zeros(N)
D1 = datetime.now()
with Pool(10) as pool:  # CPUs
    for ind, res in enumerate(pool.imap(some_func, zip(X, Y), chunksize=1)):
        output[ind] = np.max(res)
    pool.close()
    pool.join()
print('Pool: ', datetime.now()-D1)

Output:

Plain: 0:00:00.904062
Pool: 0:00:15.386251

Why so big difference? What consumes the time???
Have 80 CPUs available, tried different pool size and chunksize...

The actual function is more complex (like other_func), with it I get almost the same time for plain and parallel code, but still no speed-up :(
The input is a BIG 3D numpy array, and I need a pairwise convolution of its elements

Does this answer your question? [multiprocessing.Pool() slower than just using ordinary functions](https://stackoverflow.com/questions/20727375/multiprocessing-pool-slower-than-just-using-ordinary-functions) — Cow, Jan 12 '23 at 07:15
Mmm... As I understand, the multiprocessing.pool() is only usable, when function to call is perform really heavy computations, but not when I need a lot of repetitions of short function, yes? — Konstantin Beliaev, Jan 12 '23 at 09:05
It looks, that in my case using threads (**multiprocessing.dummy.Pool()**) is more profitable, at least calculating _other_func_ in the Pool(10) results in ~3x speed up — Konstantin Beliaev, Jan 12 '23 at 09:30
@KonstantinBeliaev Using `dummy.Pool` uses threads instead of processes, and if the functions you call are written in something other than python, and release the GIL (or are IO bound, and python can release the GIL while waiting on a file read or something), then you will indeed see speedup with multithreading. Conventionally threads are inexpensive to start up, but are limited to run one at a time due to python's memory management not being thread-safe. — Aaron, Jan 12 '23 at 15:56
@Aaron Don't quite understand. I know, that `dummy.Pool` uses threads instead of processes. And I see (with `htop`), that different threads runs on different cores, so they run in parallel, not? As all my data are in memory, and I need pure math (the test function you can see in the question), without file IO or graphic, threads a suitable for me, not? — Konstantin Beliaev, Jan 13 '23 at 12:06
@Aaron If threads runs one at a time, why I see 3x speed up on the above example with `other_func` comparing with the `for` calculation? — Konstantin Beliaev, Jan 13 '23 at 12:08
@KonstantinBeliaev Threads within a single process do run one at a time (serially) *when and if they are executing Python bytecode* because the interpreter first must acquire the Global Interpreter Lock before it can do so. `other_func` is doing calculations with `numpy`, which is implemented in C-language and releases the Global Interpreter Lock allowing other threads to run in parallel. — Booboo, Jan 13 '23 at 15:19

Mulpiprocessing.Pool runs slow

0 Answers0