Increase number of CPUs (ncores) has negative impact on multiprocessing pool

Question

I have the following code and I want to spread the task into multi-process. After experiments, I realized that increase the number of CPU cores negatively impacts the execution time.

I have 8 cores on my machine

Case 1: without using multiprocessing
- Execution time: 106 minutes
Case 2: with multiprocessing using ncores = 4
- Execution time: 37 minutes
Case 3: with multiprocessing using ncores = 7
- Execution time: 40 minutes

the following code:

import time
import multiprocessing as mp


def _fun(i, args1=10):

    #Sort matrix W

    #For loop 1 on matrix M
    #For loop 2 on matrix Y

    return value

def run1(ncores=mp.cpu_count()):
    ncores = ncores - 4 # use 4 and 1 to have ncores = 4 and 7
    _f = functools.partial(_fun,args1=x)
    with mp.Pool(ncores) as pool:
        result = pool.map(_f, range(n))
    return [t for t in result] 


start = time.time()
list1= run1() 
end = time.time()
print( 'time {0} minutes '.format((end - start)/60))

My question, what is the best practice to use multiprocessing? As I understand that as much we use cpu cores as much it will be faster.

Multiprocessing always creates additional overhead. It is not always effective and it really depends on how you are doing it. Because of this https://stackoverflow.com/questions/24376462/why-multiprocessing-is-slow and https://stackoverflow.com/questions/20727375/multiprocessing-pool-slower-than-just-using-ordinary-functions is related. — Confused Learner, May 18 '21 at 16:10
@ConfusedLearner , for pool.map(_f, range(n)) is this create a new process for each i in range n? or it initially creates ncores processes and then for each process pass a value i to range n — H.H, May 18 '21 at 21:02
`Pool.map` takes a list of tasks and splits this into a number of batches equal to the number of cores. However, the splitting can take a very long time, if the list is very big. It tries to find the optimal splitting. You could try to split your list of tasks manually. Additionally, `range` is lazy, so it has to run through the end before it can actually split the tasks. — RaJa, May 19 '21 at 05:18
@RaJa so it will only initiate a number of tasks = number of cores once, and only pass a batch of data (from the range(n)) to the tasks? is that right? or each time it initiates a task per batch? and destroy the task when finished working on batch to data and create a new task for a new batch? — H.H, May 19 '21 at 07:16
Assume you have 4 cores, so `Pool` will create 4 threads/processes. It will also split your list into 4 batches. Each thread will then get a batch, does the work and close itself. — RaJa, May 19 '21 at 10:06
@RaJa Thanks. So if I have 21k length data, and if I use 4 cores it will take lesser time than if I have 8 cores. Why? — H.H, May 19 '21 at 10:09
Most likely not. But, if your 21k objects are complex (not just numbers) than splitting 21k into 8 batches might take longer than splitting into 4 batches. Do you have 8 physical cores or is this a 4-core with HT? — RaJa, May 19 '21 at 12:46

Increase number of CPUs (ncores) has negative impact on multiprocessing pool

0 Answers0