1

I made a simple multiprocessing code, but I think it doesn't work.

When I tried this code at my laptop, I checked the processor through activity monitor app, and it showed that some processors worked. So, with this code, I ran it at workstation(Core up to 28 and used 24), and checked it again through task manager. But, CPU usage doesn't increase, just processors increased.

# Multiprocessing

def multi(input_field):
    result = subvolume.label(input_field)
    return result

test_list = [resampled_sub_1, resampled_sub_2, resampled_sub_3,
             resampled_sub_4, resampled_sub_5]

if __name__ == '__main__':
    pool = multiprocessing.Pool(processes=24)
    results = pool.map(multi, test_list)
    pool.close()
    pool.join()

When multiprocessing is done right, I think CPU usage increased than this. Where did I do something wrong?

Newbie0105
  • 119
  • 1
  • 12

1 Answers1

1

You have 24 processes in your pool, but your iterable test_list has only 5 items in it. When you pick calc_chunksize_info() from my answer here, you can calculate the generated and distributed chunks:

calc_chunksize_info(n_workers=24, len_iterable=5)
# Out: Chunkinfo(n_workers=24, len_iterable=5, n_chunks=5, chunksize=1, last_chunk=1)

Chunksize will be 1, so up to five worker-processes could run in parallel. There are simply not enough items in your input-iterable to employ all worker-processes.

As a side note: test_list should be defined within the if __name__ == '__main__':-block.

Darkonaut
  • 20,186
  • 7
  • 54
  • 65
  • Followed by your advice, I defined test_list under if statement, and there are over 24 items(35 items) in test_list. But, at workstation, cpu usage still 1 % in task manager. With this code, in my laptop, I can see CPU are working(overall 50% used). I don't know what is the problem at workstation. – Newbie0105 Jun 18 '19 at 17:24
  • @Newbie0105 But it runs successfully without raising an error, right? How long does it take on your workstation? – Darkonaut Jun 18 '19 at 17:41
  • yes, there is no error msg when running this code. I don't know exact running time because it doesn't complete. I tested 4 items in my laptop with 4 processes, it is still running(currently, 2 hours). Can I run this code at workstation? (even CPU usage is about 1 %?) – Newbie0105 Jun 18 '19 at 17:50
  • @Newbie0105 I'm afraid I can't tell you without knowing all details, there are simply too many factors into play, it depends a lot on what you're actually doing, if you have I/O in your tasks, how much data you send in relation to processing time and so on. Check if you run into memory issues with that many processes and try with halving the number of processes if so. – Darkonaut Jun 18 '19 at 18:00
  • @Newbie0105 What else could be is, that some tasks raise an exception, Pool won't terminate on that occasion, but continue to compute the remaining tasks, only to throw away the whole pool.map()-result eventually ([example](https://stackoverflow.com/q/55024997/9059420)). If your tasks are supposed to be long running anyway, you might want to switch from `pool.map()` to `pool.apply_async()` with specifying `error_callback` to get informed JIT on errors ([example](https://stackoverflow.com/questions/52272325/python-multiprocessing-abort-map-on-first-child-error/52285247#52285247)). – Darkonaut Jun 18 '19 at 18:25
  • @Newbie0105 Or you use [`concurrent.futures.ProcessPoolExecutor`](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor), which immediately fails on error. – Darkonaut Jun 18 '19 at 18:27
  • The problem was Jupyter notebook and I found the way. I tested the code at desktop(Windows). I think this code would work at workstation, tomorrow. Thanks a lot!! – Newbie0105 Jun 19 '19 at 04:09