1

Imagine I have a directed acyclic graph (DAG) with vertices and edges.

A vertex can be one of the following two types:

  • A computational task (T)
  • A resource (R)

An edge represents a dependency. It ALWAYS originates from some computational task vertex T and ends in some resource vertex R.

Restrictions on the graph structure:

  • Task vertices depend only on resource vertices (not on other tasks). This means there are ONLY outgoing edges from computational tasks, and there are ONLY incoming edges into resources.
  • A task vertex cannot have more than one edge to the same resource vertex.
  • Resource vertices don't depend on anything (no outgoing edges).
  • Each computational task vertex can have a minimum of 3 outgoing edges, and a maximum of 4 outgoing edges. This means, a computational task depends on a minimum of 3 resources and a maximum of 4.

Semantics:

  • The above graph is a task dependency graph. Whenever a task Tx runs, it blocks ALL the resources it depends on until it completes.
  • Each resource cannot be used by more than 1 task at any given time. So, tasks can briefly prevent other tasks from running until they are done.

Question:

Given the above graph, what algorithm can I use to compute all possible tasks that I can run in parallel such that they don't block each other? i.e. At any given time, I want to be able to achieve maximum parallelization. I will use the algorithm to discover all the tasks that don't block each other, and run them. Whenever a task completes, I want to re-evaluate the graph to see if I can spin off more tasks that got unblocked.

I'm looking to know the algorithm I could use for this kind of computation. This sounds like a hard graph problem, but I suspect this kind of problem is not entirely completely new....

Example:

In the provided example, I can first run T1 and T3. When these are done, I can run T2 and T4.

enter image description here

Kowshik
  • 1,541
  • 3
  • 17
  • 25
  • Your original premise is broken because it assumes all tasks use all of its resources simultaneously all of the time, and this is extremely unlikely. For example; maybe (at a specific point in time) T4 is using R7 only, T1 is using R2 only and T2 is using R3 and R4; and maybe you can run all four tasks in parallel. – Brendan Feb 26 '19 at 07:19
  • Note that if T1, T2 and T4 always use R3 and therefore T1, T2 and T4 can never be running at the same time; then there's no point using three threads and you'd just have a single "T124" thread that depends on R1, R2, R3, R4 and R7; but then if "T124" and T3 always use R7 then there'd be no point having "T124" and T3 so you'd just have a single "T1234" thread. – Brendan Feb 26 '19 at 07:24
  • Also note that if the goal is to maximize parallelism in the likely case that tasks only use resources some of the time; then it has to be adaptive (e.g. use mutexes to block tasks if and only if it's actually necessary) so that the maximum number of threads can be running at any specific point in time. Of course in that case you end up with `max_tasks_running_in_parallel = min(total_tasks, total_CPUs)`, which is a bit too trivial to state. – Brendan Feb 26 '19 at 07:28
  • You have completely misunderstood the problem statement. Resources are not CPUs here and it is perfectly possible to have a task use up all its resources in parallel (as explained in the stmt). Please view "task" and "resource" as abstractions, and don't relate these to threads/CPUs. – Kowshik Feb 26 '19 at 07:52
  • I know the resources aren't CPUs, and I don't think I've misunderstood anything. – Brendan Feb 26 '19 at 08:00
  • Note: I did use the word "thread" when I should've (for the sake of consistency) used the word "task", but it really makes no difference which word is used (you could call them "borklettes" if you really wanted to and it wouldn't matter). – Brendan Feb 26 '19 at 08:11

1 Answers1

1

Denoting the set of resources as S, and each task as a subset of S, your problem is the maximum set packing. See also here.

Lior Kogan
  • 19,919
  • 6
  • 53
  • 85