1

I am trying to create a workflow management where I am representing the workflow my a graph. The graph is a simple digraph where the nodes represent some tasks and the edges point to the tasks which depend on this task with the parent->child relationship. Now, what I would like to implement is that no task should start until all its parents have finished.

So say I have the following graph represented as python dictionaries:

graph = {'A': ['B', 'D'], 'B': ['D', 'C'], 'C': ['D', 'E'],
     'D': ['E'], 'E': []}

In this case, A should execute first followed by B which is then followed by C. After which D gets executed and finally task E finished which depends on D.

One way I can envision doing this is using a breadth or depth first search to visit all the nodes and then checking at each node if all the parents have been finished. If yes, I start that task and I keep doing this till all the tasks have been started.

I was wondering if there are any more sophisticated/efficient solution to this problem rather than traversing the graph many times?

Luca
  • 10,458
  • 24
  • 107
  • 234
  • 2
    I think the problem you described is that of [topological sorting](https://en.wikipedia.org/wiki/Topological_sorting). – arekolek Aug 02 '16 at 12:04
  • Possible duplicate of [How can I order a list of connections](http://stackoverflow.com/questions/16409486/how-can-i-order-a-list-of-connections) – arekolek Aug 02 '16 at 12:15
  • Is it graph traversal or tree traversal? What is expected output for `C_3`? – xenteros Aug 02 '16 at 12:25

3 Answers3

2

It depends on your graph data structure.

You can use BFS if your graph is stored as nodes, as you will navigate from the root (maximal task dependancy) to the leaves (minimal task dependancy) by going from siblings to siblings (tasks with same level of dependancy).

If your graph is stored as adjacency list as you explained, you sould do a topological sort, to get from the maximal task dependancy to the minimal task dependancy progressively.

That will give you then O(n) complexity for traversing the graph.

hmicn
  • 114
  • 9
1

You won't need multiple traversal. Just a plain BFS will do the job. In BFS, parent is always visited before children.

Pranalee
  • 3,389
  • 3
  • 22
  • 36
1

You can use a library like graph_tool that has topological sort implemented:

graph = {'A': ['B', 'D'], 'B': ['D', 'C'], 'C': ['D', 'E'], 'D': ['E'], 'E': []}
i2a = dict(enumerate(graph))
a2i = {v:k for k,v in i2a.items()}

import graph_tool.all as gt
g = gt.Graph()
g.add_edge_list([(a2i[s], a2i[t]) for s in graph for t in graph[s]])
print(*[i2a[v] for v in gt.topological_sort(g)])

Outputs:

A B C D E
arekolek
  • 9,128
  • 3
  • 58
  • 79
  • Thanks for that tip as well. I have a bit of constraint on external libs that I can use but it is very useful to know this! – Luca Aug 02 '16 at 12:53