16

When I run tensorflow training (with custom defined graph, closed source), it outputs the warning:

2018-10-03 14:29:24.352895: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.

What does it mean? What could likely cause this problem and how to avoid it?

Update: For the record, in my case, tensorflow still works correctly despite this warning. So I think it just means there are more loops in the computation graph than what tensorflow expects, not necessarily infinite loop. I fixed this by avoid using manual loop in code, but instead using tensor manipulation (stack, concat, slice, reshape...)

THN
  • 3,351
  • 3
  • 26
  • 40

2 Answers2

7

A topological ordering of a directed graph is an ordering of its vertices in such a way that whenever there is an edge from vertex u to vertex v, vertex u comes before vertex v in the ordering.

This kind of ordering is possible for every directed acyclic graph but not for arbitrary graphs. There is most likely some kind of cycle in your graph that prevents the sorting algorithm from succeeding. So, the way to go is searching for the cycle and removing it in some way.

As an example, consider a very small graph with two vertices, u and v, and two edges, u -> v and v -> u. There is no way to sort u and v in accordance with the requirements given above.

Without further information on your code it is hard to say exactly what is going on.

piripiri
  • 1,925
  • 2
  • 18
  • 35
  • 4
    But what is it in `tensorflow` specifically? A computational graph must not be cyclic , or it will loop forever. Why `tensorflow` outputs this warning and still be able to train? How `tensorflow` breaks the loop? – THN Oct 03 '18 at 05:32
  • 2
    If the graph is topologically ordered it is straightforward to do the calculation in the correct order without further dependency checks. So, the topological ordering is most likely a kind of optimization, just as it is indicated in your updated error message. – piripiri Oct 04 '18 at 07:58
0

This error message can be encountered if there is a mismatch between your TF/CUDA version and the cudnn version that you're using. In this case, it seems to be a compatibility bug and doesn't necessarily indicate that you have bugs in your own code or model design. From what I can gather it happens between TF 1.10-1.14. Downgrading or upgrading TF (or cudnn) to the correct version should get rid of it. Note that while this is an error message, it is unclear if it actually produces any mistakes as models will train seemingly normally (except for the error message).

runDOSrun
  • 10,359
  • 7
  • 47
  • 57