This is a design rather than a code related question. It concerns threadpools, specifically how to organize the execution of the tasks. I am using C++ and Boost thread in a cross platform scenario.
I have groups of tasks that I need to parallelize. X number of groups fire off x number of subtasks. All subtasks must be completed in order for a group to be complete in its turn, but it does not matter in what order the subtasks are completed as long as the group can determine when all subtasks belonging to it are finished. The main thread must wait for all groups to complete in a similar fashion to how a group waits for its subtasks. In other words, it is not important in what order the groups complete as long as the main thread can determine when they're all done.
To put it a different way:
All groups wait for their respective subtasks to finish. It is not important in what order the subtasks finish as long as the group can figure out when they're all completed.
The main thread waits for all groups to complete. It is not important in what order they complete, as long as the main thread can detect when all groups are completed. In other words it is the same exact concept as for the group specific subtasks.
All this must be done with N threads in a pool, plus the main thread so N+1 threads all in total. N must be configurable to any arbitrary value.
If it helps, a task is simply a function that needs to be invoked from one of the N threads.
Does anyone have any tips for how I might implement this?