0

I have a 2D vector of dependencies. For the first row to finish its tasks in the 2D vector it is dependent on the second row finishing its tasks. The second row is dependent on the third row to finish its tasks before it can begin. This continues on ward. The goal of the algorithm is to use threading and assign each value(task) in the row to a thread. Therefore the threads can finish one row at a time starting from the bottom.

To go a bit more in depth, task all is dependent on task a and task b to finish before it can proceed. Task a is waiting on tasks (a1,a2,a3,a4,d) to finish first before being able to finish. Task b is waiting on tasks (b1,b2,b3,e,a4) to finish before being able to finish etc. Task a and Task b are on the same row. Tasks (a1,a2,a3,a4,d) and tasks (b1,b2,b3,e,a4) are on the same row as well. I want to assign one thread to each task. I'm slightly new here so if there is anything I can do to help clarify I would be happy to do so. I would also appreciate it if there is a way to assign a thread to the next row if all the elements in the prior row are being handled by a thread and the dependencies for that task have been completed.

The amount of threads doesn't matter so we can assume there is a variable thread_count.

DependencyListImage

  • I understand the concept of a dependency graph, but I do not understand the significance you are attributing to "rows" of the particular representation you have depicted. I also do not understand your description of the software architecture you have in mind. One thing that might help, though, is to recognize that you're thinking about it backward: one does not assign threads to tasks; rather, one assigns tasks to threads. – John Bollinger Apr 06 '20 at 23:09
  • @JohnBollinger Hey John to explain it slightly better essentially what I mean is that I have a task I want to carry out which is named task all. But for me to finish the task called all, I have to do task a and task b first. In order to finish task a I have to finish task a1,a2 ... and for b I have to finish task b1,b2 ... . Each task has an arrow to the task that has to be finished for it to take place. The rows are just the depth of each task. I placed tasks of the same depth in the same row because if for example all of row 5's tasks were to be completed it could proceed to row 4s tasks etc – Cantrash123 Apr 06 '20 at 23:17
  • One defines the work of a thread first and foremost by one's choice of the thread's start function. Perhaps that's all you really wanted to know. If you wanted more than that then I'm afraid you'll need to clarify exactly what. – John Bollinger Apr 06 '20 at 23:17
  • @JohnBollinger So that is what I am asking I want to know how to give each thread a different task from row 5. Once those are completed to give each thread a different task from row 4. The structure for the rows I am using is 2D vector. In other words a vector whichs element is a second vector. If possible allow the for loop that I believe I have to use, to assign a task to a thread if all the tasks in the prior row are being completed by another thread or have been completed. – Cantrash123 Apr 06 '20 at 23:21
  • So you have in mind that there will be (or at least can be) fewer threads than tasks? Does the program have access to a representation of the tasks and dependencies? It seems like that would be necessary for the program to make appropriate task assignments dynamically. – John Bollinger Apr 06 '20 at 23:26
  • @JohnBollinger Yes the assumption is that there are fewer threads than tasks. The program does have access to a representation of the tasks and dependencies. Each task if placed in a function such as get_neighbors(task) will give its dependencies as well outside of the 2D array. – Cantrash123 Apr 06 '20 at 23:31
  • On a more practical note, I would suggest you use OpenMP or TBB (if C++ is available to you) to realize a thread pool for you. A naïve implementation could just enqueue new tasks at the end of each task, given your model of dependencies and the current state of fulfillment. See https://stackoverflow.com/questions/13788638/difference-between-section-and-task-openmp for a quick overview of how you can combine OMP tasks with a final barrier to realize this. – ypnos Apr 07 '20 at 09:32

1 Answers1

1

You have work partitioned into a collection of units you're calling "tasks". You want to arrange for these to be executed in multithreaded fashion, taking explicitly into account a set of defined dependencies among tasks. Furthermore, you want to arrange for having threads perform multiple, separate tasks.

Several of those characteristics are crying out for a thread pool. This would provide the general framework for distributing many tasks to fewer threads. But the dependencies among your tasks require you to have a smarter scheduling algorithm than a simple FIFO approach. Basically, your scheduling function must always choose a task whose dependencies (if any) have all already completed. This may mean that some of the available threads are idle some or even all of the time.

I don't think it's very useful to visualize it in terms of handling a dependency tree one row at a time, at least not if in fact there is more than one thread executing tasks in parallel. It is entirely conceivable that there would be ready-to-run tasks at multiple levels of the tree at the same time, and there doesn't seem to be an advantage to withholding them artificially.

I have not written any example code or described any details because the question is too high-level to afford that. There are definitely some tricky details to sort out, but if you have trouble with some of those then that would be the subject of one or more different, more specific questions.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157