Say I have a set of 20 CPU heavy tasks (~1 hour each, but some take a lot longer) that are run by calling a function via e.g. Pool.apply_async(function, task_list)
etc. PC has 12 Cores. So I can spread the load, and use all 12 cores.
The result of each task could require that a new task has to be run (some task might need 1 new run, others maybe 10).
When a new task is required, I would like to spawn that task into the existing pool task_list
, to fully optimize CPU usage at all time.
Currently I run the 20 tasks, wait to finish, start the new ~18 tasks, wait to finish, start the remaining new tasks, ...., while it happens only 1 core is being used for an hour, instead of 12. This adds up to a loss of several hours to days in calculation time. I could run the updated task in the same worker, however this results in an even larger loss)
With pool it does not seem possible to add more tasks to the pool while it is started. Is that correct, or are there some smart ways to do this that I missed while searching all over the place?
(Only option I see is to use process
instead of pool
, and make a while loop over a dynamic list that starts a typical task as a single process, while only allowing up to 12 processes running at the same time, and for each task, or new task, put them in the dynamic list, and remove the task when it is send to a process.)