python multiprocessing performance decay very fast with core numbers

Question

i've got a new server with 2 intel xeon gold 6138 CPUs, each with 20core/40threads so total 40core/80threads.

im testing it with a very simple task, no IO, just pure calculation. but the per-thread-efficiency decayed really fast.

import numpy as np
from datetime import datetime as dt
from multiprocessing import Pool


def trytrytryshare(i,times):        
    
    for j in range(times):
        indata[0] * indata[1]
    return

def trymultishare(thread = 70 , times = 10):
    st = dt.now()
    args_l = [(i,times) for i in range(thread)]               
    print(st)
    p = Pool(thread)
    for i in range(len(args_l)):
       p.apply_async(func = trytrytryshare, args = (args_l[i]))
    p.close()
    p.join()
    print('%d threads finished in %d secs' %(thread,(dt.now()-st).seconds))
    
    return
   

if __name__ == '__main__':
    global indata
    size = 10000
    x = np.random.rand(size,size)
    y = np.random.rand(size,size)
    indata = (x,y)
    for i in range(1,71,10):
        trymultishare(thread = i,times = 20)

one thread cost about 7 seconds, so i was expecting 80 threads should cost 7 secs or slightly more. but its costing 140secs link to the result screenshot, so performance for each thread decayed a whooping 95%!

is this standard or am i doing anything wrong? trying to understand why it decayed so much...

thx guys!

You're creating global numpy arrays each of size ~763MB. The memory is not shared between the processes, so I'm assuming when using multiprocessing each process creates copy of these arrays, so 80 processes have to create 80 * 2 * 763 MB of copies and this takes time. — Andrej Kesely, Sep 02 '22 at 11:05
I suggest to look at mmap: https://stackoverflow.com/questions/9964809/numpy-vs-multiprocessing-and-mmap — Andrej Kesely, Sep 02 '22 at 11:08
hi，looks like the global didnt create multiple copies, and if i still use the big global array but delete the calculation process entirely, it cost around 0.1s for each thread (70thread cost 7s) — rpking, Sep 05 '22 at 03:06

python multiprocessing performance decay very fast with core numbers

0 Answers0