How to calculate time estimate for parallel tasks?

Question

I need to calculate the total amount of time for a certain number of tasks to be completed. Details:

5 tasks total. Time estimates (in seconds) for each: [30, 10, 15, 20, 25]
Concurrency: 3 tasks at a time

How can I calculate the total time it will take to process all tasks, given the concurrency? I know it will take at least as long as the longest task (25 seconds), but is there a formula/method to calculate a rough total estimate, that will scale with more tasks added?

There are not so many combination so a piece of paper and a pen can easily solve this (and that does not require any advanced skills). The time is dependent to the scheduling. A dumb solution is to execute everything serially (worst time). The general problem is NP-complete and is called [Bin Packing](https://en.wikipedia.org/wiki/Bin_packing_problem) though there are good approximations for this. — Jérôme Richard, Jul 22 '22 at 23:12
@LouisLac thank you for the answer! I actually ended up following another approach and think I overcomplicated it a bit, but your solution would work great in the case where we can use the average task duration. The bin packing algorithm assumes a "bin" has a fixed size, but there can be unlimited bins. However in my scenario, we have limited bins, but unlimited size for each bin. Further, each task can vary widely in length, so average doesn't work. — dominic, Aug 22 '22 at 18:22
@LouisLac Luckily I was able to simply divide total cumulative time across tasks by number of workers to get an estimate. Then, if # tasks is smaller than the concurrency set, then the time estimate is just that of the longest task. — dominic, Aug 22 '22 at 18:23

score 0 · Accepted Answer · answered Jul 24 '22 at 11:19

If you don't mind making some approximations it could be quite simple. If the tasks take roughly the same time to complete, you could use the average of the tasks duration as a basis (here, 20 seconds).

Assuming that the system is always full of tasks, that task duration is small enough, that there are many tasks and that concurrency level is high enough, then:

estimated_duration = average_task_duration * nb_tasks / nb_workers

Where nb_workers is the number of concurrent threads.

Here is some Python code that shows the idea:

from random import random
from time import sleep, monotonic
from concurrent.futures import ThreadPoolExecutor

def task(i: int, duration: float):
  sleep(duration)

def main():
  nb_tasks = 20
  nb_workers = 3
  average_task_duration = 2.0
  expected_duration = nb_tasks * average_task_duration / nb_workers

  durations = [average_task_duration + (random() - 0.5) for _ in range(nb_tasks)]

  print(f"Starting work... Expected duration: {expected_duration:.2f} s")
  start = monotonic()
  with ThreadPoolExecutor(max_workers=nb_workers) as executor:
    for i, d in enumerate(durations):
      executor.submit(task, i, d)
  stop = monotonic()

  print(f"Elapsed: {(stop - start):.2f} s")


if __name__ == "__main__":
  main()

If these hypotheses cannot hold in your case, then you'd better use a Bin Packing algorithm as Jerôme suggested.

How to calculate time estimate for parallel tasks?

1 Answers1