Not able to utilise 100% CPU/cores using multiprocessing

Question

I have been working with the multiprocessing module trying to parallelise a for loop that takes 27 min to run on a single core. I have 12 CPU cores at my disposal.

The meat of the code that I am using is given below for parallelisation is given below:

import multiprocessing as mp

def Parallel_Work(val,b,c):
    # Filter basis val to make a dataframe and do some work on it
    # ...

values = pd.Series( [ "CompanyA",
                      "CompanyB",
                      "CompanyC", ] ) # Actual values list is quite big

with mp.Pool(processes=4) as pool:
        results= [pool.apply( Parallel_Work,
                              args = ( val, b, c )
                              ) for val in values.unique() ]

When I run this code, I have run across two things that I haven't been able to figure out

None of the processes run at maximum 100% CPU usage. In fact the combined CPU Usage of all processes sums up to 100% every time (link to screenshot attached). Are the processes really using different cores? If not, how do I make sure they do that.
results of "top" command
There are 4 processes spawned, however only 2 are active at any given point of time. Am I missing something here?

Please let me know if I can provide any more information.

Could you provide more information please? For example, what's in `values`? Does `values.unique()` return more than 2 items? — Emrah Diril, Feb 07 '20 at 06:26
@EmrahDiril I have made the edit. Values is essentially a pandas Series. values.unique() will return the series after removing duplicates. No, it doesn't return more than two items. — Mohit Munjal, Feb 07 '20 at 06:37
@EmrahDiril It does return more than two items. Sorry for the confusion. Values.unique is a pandas Series of 700 strings. — Mohit Munjal, Feb 07 '20 at 07:24

Emrah Diril · Answer 1 · 2020-02-07T07:23:37.977

1

I think you need to be using apply_async instead of apply which blocks until the result is ready.

See this SO question for details on apply, apply_async and map

edited Feb 07 '20 at 07:23

answered Feb 07 '20 at 06:45

Emrah Diril

1,687
1
19
27

1

Thanks for replying. values.unique() is a pandas Series of 700 strings. So it does return more than 2 items and is fairly larger than the no. of processes I am spawning. There is no I/O happening in the Parallel_Work function. It just manipulates a dataframe(b) that is passed to it and then returns a pandas Series. – Mohit Munjal Feb 07 '20 at 07:18

Not able to utilise 100% CPU/cores using multiprocessing

1 Answers1