6

Say I have a set of 20 CPU heavy tasks (~1 hour each, but some take a lot longer) that are run by calling a function via e.g. Pool.apply_async(function, task_list) etc. PC has 12 Cores. So I can spread the load, and use all 12 cores.

The result of each task could require that a new task has to be run (some task might need 1 new run, others maybe 10).

When a new task is required, I would like to spawn that task into the existing pool task_list, to fully optimize CPU usage at all time.

Currently I run the 20 tasks, wait to finish, start the new ~18 tasks, wait to finish, start the remaining new tasks, ...., while it happens only 1 core is being used for an hour, instead of 12. This adds up to a loss of several hours to days in calculation time. I could run the updated task in the same worker, however this results in an even larger loss)

With pool it does not seem possible to add more tasks to the pool while it is started. Is that correct, or are there some smart ways to do this that I missed while searching all over the place?

(Only option I see is to use process instead of pool, and make a while loop over a dynamic list that starts a typical task as a single process, while only allowing up to 12 processes running at the same time, and for each task, or new task, put them in the dynamic list, and remove the task when it is send to a process.)

uytda
  • 83
  • 1
  • 8
  • 1
    `apply_async` is for single function call jobs and is using one process, did you mean `map_async`? You can send new jobs into an existing pool _before_ all tasks are finished with an async-method. They also offer registering callback functions when the result is ready. Easier for your case would be to just put everything what it needs to complete the task into one function (skipping the resubmitting of another task) and use `pool.map` with `chunksize=1`. Highly relevant background to Pool's chunksize you can find [here](https://stackoverflow.com/a/54032744/9059420). – Darkonaut Jan 24 '19 at 17:44
  • thanks, should indeed be a map option, and chunksize needs definitely be 1. Keeping the task in the same function creates the risk of that the last started worker function runs for 10 hours, while the others are idle. Your suggestion the task_list can be increased combined with the answer by @asafpr helped me understand the 'Queue()' feature, so currently my guess is when I use the task_list as a Queue for the 'args' in the pool, and add tasks to it should work. Found an example that worked using 'Process' instead of 'Pool' I will update and clean later on this weekend hopefully. – uytda Jan 26 '19 at 08:05
  • Python help also suggests this: https://docs.python.org/2/library/multiprocessing.html#exchanging-objects-between-processes (first example when opening link) – uytda Jan 26 '19 at 08:06

1 Answers1

0

You could use a Queue, you can see an example here: https://www.journaldev.com/15631/python-multiprocessing-example This way you'll be able to add to the Queue and have a constant number of runners taking from the Queue.

asafpr
  • 347
  • 1
  • 5