i've got a new server with 2 intel xeon gold 6138 CPUs, each with 20core/40threads so total 40core/80threads.
im testing it with a very simple task, no IO, just pure calculation. but the per-thread-efficiency decayed really fast.
import numpy as np
from datetime import datetime as dt
from multiprocessing import Pool
def trytrytryshare(i,times):
for j in range(times):
indata[0] * indata[1]
return
def trymultishare(thread = 70 , times = 10):
st = dt.now()
args_l = [(i,times) for i in range(thread)]
print(st)
p = Pool(thread)
for i in range(len(args_l)):
p.apply_async(func = trytrytryshare, args = (args_l[i]))
p.close()
p.join()
print('%d threads finished in %d secs' %(thread,(dt.now()-st).seconds))
return
if __name__ == '__main__':
global indata
size = 10000
x = np.random.rand(size,size)
y = np.random.rand(size,size)
indata = (x,y)
for i in range(1,71,10):
trymultishare(thread = i,times = 20)
one thread cost about 7 seconds, so i was expecting 80 threads should cost 7 secs or slightly more. but its costing 140secs link to the result screenshot, so performance for each thread decayed a whooping 95%!
is this standard or am i doing anything wrong? trying to understand why it decayed so much...
thx guys!