451

Is there an efficient algorithm for detecting cycles within a directed graph?

I have a directed graph representing a schedule of jobs that need to be executed, a job being a node and a dependency being an edge. I need to detect the error case of a cycle within this graph leading to cyclic dependencies.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
Peauters
  • 4,773
  • 3
  • 18
  • 11
  • 20
    You say you want to detect all cycles, but your use-case suggests that it would be sufficient to detect whether there are any cycles. – Steve Jessop Nov 04 '08 at 11:37
  • 37
    It would be better to detect all cycles so they could be fixed in one go, rather than check, fix, check, fix etc. – Peauters Nov 04 '08 at 12:04
  • 2
    You should read the paper "Finding all the elementary circuits of a directed graph" by Donald B. Johnson. It will find only elementary circuits, but this should be sufficient for your case. And here is my Java implementation of this algorithm ready to use: https://github.com/1123/johnson – user152468 Feb 14 '15 at 09:17
  • Run DFS with additional modification for the algorithm: mark each node you visited. if you visit a node that is already visited, then you have a cicle. when you retreat from a path, unmark the nodes that are visited. – Hesham Yassin May 07 '16 at 09:32
  • 7
    @HeshamYassin, if you visit a node you already visited, it doesn't necessarily mean there's a loop. Please read my comment http://cs.stackexchange.com/questions/9676/the-purpose-of-grey-node-in-graph-depth-first-search#comment140072_9681. – Maksim Dmitriev Jan 15 '17 at 17:40
  • Your first sentence contradicts your last sentence; please fix. If you really want to detect *all* the cycles (first sentence), your worst case output size, and runtime, will be exponential with respect to your input size. If you really want to just detect the error case of *any* cycle (last sentence), you can do that in time that's linear in the size of the input. I'd recommend the latter. – Don Hatch Feb 04 '19 at 22:37
  • @Pneauters no, it would not necessarily be better to detect all cycles. Consider the case when there is an exponential number of them. – Don Hatch Feb 04 '19 at 22:40
  • You can use my simple and effective implementation: https://stackoverflow.com/a/60196714/1763149 – Marcin Raczyński Feb 12 '20 at 21:39

14 Answers14

216

Tarjan's strongly connected components algorithm has O(|E| + |V|) time complexity.

For other algorithms, see Strongly connected components on Wikipedia.

nbro
  • 15,395
  • 32
  • 113
  • 196
aku
  • 122,288
  • 32
  • 173
  • 203
  • 85
    How does finding the strongly connected components tell you about the cycles that exist in the graph? – Peter Mar 29 '10 at 17:21
  • 4
    May be somebody can confirm but the Tarjan algorithm does not support cycles of nodes pointing directly to themselves, like A->A. – Cédric Guillemette Sep 14 '10 at 13:05
  • 30
    @Cedrik Right, not directly. This isn't a flaw in Tarjan's algorithm, but the way it is used for this question. Tarjan doesn't directly find *cycles*, it finds strongly connected components. Of course, any SCC with a size greater than 1 implies a cycle. Non-cyclic components have a singleton SCC by themselves. The problem is that a self-loop will also go into a SCC by itself. So you need a separate check for self-loops, which is pretty trivial. – mgiuca Feb 09 '11 at 01:13
  • As Peter has noted, strongly connected components are not equivalent to cycles, but may help in efficiently finding cycles within a graph. For finding cycles, see the paper "Finding all the elementary cycles of a directed graph by" Donald B. Johnson, and my implementation thereof in Java: https://github.com/1123/johnson – user152468 Feb 14 '15 at 09:22
  • 22
    (all strongly connected components in the graph) != (all cycles in the graph) – optimusfrenk Feb 16 '15 at 14:24
  • 2
    From Wikipedia: "If each strongly connected component is contracted to a single vertex, the resulting graph is a directed acyclic graph, the condensation of G. A directed graph is acyclic if and only if it has no strongly connected subgraphs with more than one vertex, because a directed cycle is strongly connected and every nontrivial strongly connected component contains at least one directed cycle." – João Almeida Feb 03 '17 at 10:47
  • 7
    @ aku : A three color DFS also has same runtime `O(|E| + |V|)`. Using white (never visited), grey (current node is visited but all reachable nodes are not yet visited) and black (all reachable nodes are visited along with the current one) color coding, if a grey node finds another grey node then we've a cycle. [Pretty much what we've in Cormen's algorithm book]. Wondering if 'Tarjan's algorithm' has any benefit over such DFS!! – KGhatak Jun 23 '17 at 16:29
  • 2
    This answer is correct, but Tarjan's algorithm is overkill if you only need to detect cycles, not enumerate them. See Kurt Peek's answer for the simpler case of detecting but not enumerating cycles. – Luke Hutchison Mar 28 '20 at 21:11
85

Given that this is a schedule of jobs, I suspect that at some point you are going to sort them into a proposed order of execution.

If that's the case, then a topological sort implementation may in any case detect cycles. UNIX tsort certainly does. I think it is likely that it is therefore more efficient to detect cycles at the same time as tsorting, rather than in a separate step.

So the question might become, "how do I most efficiently tsort", rather than "how do I most efficiently detect loops". To which the answer is probably "use a library", but failing that the following Wikipedia article:

http://en.wikipedia.org/wiki/Topological_sorting

has the pseudo-code for one algorithm, and a brief description of another from Tarjan. Both have O(|V| + |E|) time complexity.

nbro
  • 15,395
  • 32
  • 113
  • 196
Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
  • A topological sort can detect cycles, inasmuch as it relies on a depth-first search algorithm, but you need additional bookkeeping to actually detect cycles. See Kurt Peek's correct answer. – Luke Hutchison Mar 28 '20 at 21:11
73

According to Lemma 22.11 of Cormen et al., Introduction to Algorithms (CLRS):

A directed graph G is acyclic if and only if a depth-first search of G yields no back edges.

This has been mentioned in several answers; here I'll also provide a code example based on chapter 22 of CLRS. The example graph is illustrated below.

enter image description here

CLRS' pseudo-code for depth-first search reads:

enter image description here

In the example in CLRS Figure 22.4, the graph consists of two DFS trees: one consisting of nodes u, v, x, and y, and the other of nodes w and z. Each tree contains one back edge: one from x to v and another from z to z (a self-loop).

The key realization is that a back edge is encountered when, in the DFS-VISIT function, while iterating over the neighbors v of u, a node is encountered with the GRAY color.

The following Python code is an adaptation of CLRS' pseudocode with an if clause added which detects cycles:

import collections


class Graph(object):
    def __init__(self, edges):
        self.edges = edges
        self.adj = Graph._build_adjacency_list(edges)

    @staticmethod
    def _build_adjacency_list(edges):
        adj = collections.defaultdict(list)
        for edge in edges:
            adj[edge[0]].append(edge[1])
            adj[edge[1]] # side effect only
        return adj


def dfs(G):
    discovered = set()
    finished = set()

    for u in G.adj:
        if u not in discovered and u not in finished:
            discovered, finished = dfs_visit(G, u, discovered, finished)


def dfs_visit(G, u, discovered, finished):
    discovered.add(u)

    for v in G.adj[u]:
        # Detect cycles
        if v in discovered:
            print(f"Cycle detected: found a back edge from {u} to {v}.")
            break

        # Recurse into DFS tree
        if v not in finished:
            dfs_visit(G, v, discovered, finished)

    discovered.remove(u)
    finished.add(u)

    return discovered, finished


if __name__ == "__main__":
    G = Graph([
        ('u', 'v'),
        ('u', 'x'),
        ('v', 'y'),
        ('w', 'y'),
        ('w', 'z'),
        ('x', 'v'),
        ('y', 'x'),
        ('z', 'z')])

    dfs(G)

Note that in this example, the time in CLRS' pseudocode is not captured because we're only interested in detecting cycles. There is also some boilerplate code for building the adjacency list representation of a graph from a list of edges.

When this script is executed, it prints the following output:

Cycle detected: found a back edge from x to v.
Cycle detected: found a back edge from z to z.

These are exactly the back edges in the example in CLRS Figure 22.4.

Jthorpe
  • 9,756
  • 2
  • 49
  • 64
Kurt Peek
  • 52,165
  • 91
  • 301
  • 526
  • 1
    I get `RecursionError: maximum recursion depth exceeded while calling a Python object` for this code. – zino Nov 30 '20 at 23:12
  • 3
    @zino Looks like there should be a `break` after the cycle is detected. I tried adding it but the edit queue is full. – A_P Dec 09 '20 at 03:53
  • nit: `discovered, finished = dfs_visit(G, u, discovered, finished)` can be replaced with: `dfs_visit(G, u, discovered, finished)` and `dfs-visit` can return `None` – Sanandrea Nov 29 '21 at 23:52
  • @zino @A_P I edited the example to add that missing `break` statement after a cycle has been detected. – Christian Long May 10 '22 at 20:00
  • However, running this on other graphs I'm getting `RuntimeError: dictionary changed size during iteration`. This is because `G.adj` is a default dict, and the lookup in `for v in G.adj[u]` is adding to it. – Christian Long May 10 '22 at 21:28
  • 1
    @ChristianLong You can add a call to `adj[edge[1]]` in the for loop of `_build_adjacency_list` which has the effect of initializing the keys for nodes that don't appear as `edge[0]` for at least some edge (which will generally happen in graph's without a cycle) (added to the answer just now) – Jthorpe Dec 14 '22 at 21:48
38

The simplest way to do it is to do a depth first traversal (DFT) of the graph.

If the graph has n vertices, this is a O(n) time complexity algorithm. Since you will possibly have to do a DFT starting from each vertex, the total complexity becomes O(n^2).

You have to maintain a stack containing all vertices in the current depth first traversal, with its first element being the root node. If you come across an element which is already in the stack during the DFT, then you have a cycle.

nbro
  • 15,395
  • 32
  • 113
  • 196
  • 27
    This would be true for a "regular" graph, but is false for a *directed* graph. For example, consider the "diamond dependency diagram" with four nodes: A with edges pointing to B and C, each of which has an edge pointing to D. Your DFT traversal of this diagram from A would incorrectly conclude that the "loop" was actually a cycle - although there is a loop, it is not a cycle because it cannot be traversed by following the arrows. – Peter Mar 29 '10 at 17:19
  • 13
    @peter can you please explain how DFT from A will incorrectly conclude that there is a cycle? – Deepak Sep 26 '12 at 12:50
  • 15
    @Deepak - In fact, I misread the answer from "phys wizard": where he wrote "in the stack" I thought "has already been found". It would indeed be sufficient (for detecting a directed loop) to check for dupes "in the stack" during the execution of a DFT. One upvote for each of you. – Peter Sep 27 '12 at 13:38
  • 2
    Why do you say the time complexity is `O(n)` while you suggest checking the stack to see if it already contains a visited node? Scanning the stack adds time to `O(n)` runtime because it has to scan the stack on each new node. You can achieve `O(n)` if you mark the nodes visited – James Wierzba Aug 03 '16 at 20:27
  • As Peter said, this is incomplete for directed graphs. See Kurt Peek's correct answer. – Luke Hutchison Mar 28 '20 at 21:09
34

In my opinion, the most understandable algorithm for detecting cycle in a directed graph is the graph-coloring-algorithm.

Basically, the graph coloring algorithm walks the graph in a DFS manner (Depth First Search, which means that it explores a path completely before exploring another path). When it finds a back edge, it marks the graph as containing a loop.

For an in depth explanation of the graph coloring algorithm, please read this article: http://www.geeksforgeeks.org/detect-cycle-direct-graph-using-colors/

Also, I provide an implementation of graph coloring in JavaScript https://github.com/dexcodeinc/graph_algorithm.js/blob/master/graph_algorithm.js

Armin Primadi
  • 754
  • 6
  • 10
29

Start with a DFS: a cycle exists if and only if a back-edge is discovered during DFS. This is proved as a result of white-path theorum.

nbro
  • 15,395
  • 32
  • 113
  • 196
Ajay Garg
  • 315
  • 3
  • 2
  • 4
    Yes, i think the same, but this isn't enough, I post my way cs.stackexchange.com/questions/7216/find-the-simple-cycles-in-a-directed-graph – Jonathan Prieto-Cubides Dec 06 '12 at 21:58
  • True. Ajay Garg is only telling about how to find "a cycle", which is a part answer for this question. Your link talks about finding all cycles as per the question asked, but again it looks like it uses same approach as Ajay Garg, but also does all possible dfs-trees. – Manohar Reddy Poreddy Sep 11 '16 at 06:39
  • This is incomplete for directed graphs. See Kurt Peek's correct answer. – Luke Hutchison Mar 28 '20 at 21:09
  • It doesn't answer a question, a question asks for a solution to find all the cycles – sia Jul 25 '20 at 01:58
9

If you can't add a "visited" property to the nodes, use a set (or map) and just add all visited nodes to the set unless they are already in the set. Use a unique key or the address of the objects as the "key".

This also gives you the information about the "root" node of the cyclic dependency which will come in handy when a user has to fix the problem.

Another solution is to try to find the next dependency to execute. For this, you must have some stack where you can remember where you are now and what you need to do next. Check if a dependency is already on this stack before you execute it. If it is, you've found a cycle.

While this might seem to have a complexity of O(N*M) you must remember that the stack has a very limited depth (so N is small) and that M becomes smaller with each dependency that you can check off as "executed" plus you can stop the search when you found a leaf (so you never have to check every node -> M will be small, too).

In MetaMake, I created the graph as a list of lists and then deleted every node as I executed them which naturally cut down the search volume. I never actually had to run an independent check, it all happened automatically during normal execution.

If you need a "test only" mode, just add a "dry-run" flag which disables the execution of the actual jobs.

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
7

There is no algorithm which can find all the cycles in a directed graph in polynomial time. Suppose, the directed graph has n nodes and every pair of the nodes has connections to each other which means you have a complete graph. So any non-empty subset of these n nodes indicates a cycle and there are 2^n-1 number of such subsets. So no polynomial time algorithm exists. So suppose you have an efficient (non-stupid) algorithm which can tell you the number of directed cycles in a graph, you can first find the strong connected components, then applying your algorithm on these connected components. Since cycles only exist within the components and not between them.

Yuwen
  • 963
  • 11
  • 8
  • 1
    True, if the number of nodes is taken as the size of the input. You could also describe the runtime complexity in terms of the number of edges or even cycles, or a combination of these measures. The algorithm "Finding all the elementary circuits of a directed graph" by Donald B. Johnson has polynomial running time given by O((n + e)(c + 1)) where n is the number of nodes, e the number of edges and c the number of elementary circuits of the graph. And here is my Java implementation of this algorithm: https://github.com/1123/johnson. – user152468 Feb 14 '15 at 09:14
4

I had implemented this problem in sml ( imperative programming) . Here is the outline . Find all the nodes that either have an indegree or outdegree of 0 . Such nodes cannot be part of a cycle ( so remove them ) . Next remove all the incoming or outgoing edges from such nodes. Recursively apply this process to the resulting graph. If at the end you are not left with any node or edge , the graph does not have any cycles , else it has.

Rpant
  • 974
  • 2
  • 14
  • 37
2

https://mathoverflow.net/questions/16393/finding-a-cycle-of-fixed-length I like this solution the best specially for 4 length:)

Also phys wizard says u have to do O(V^2). I believe that we need only O(V)/O(V+E). If the graph is connected then DFS will visit all nodes. If the graph has connected sub graphs then each time we run a DFS on a vertex of this sub graph we will find the connected vertices and wont have to consider these for the next run of the DFS. Therefore the possibility of running for each vertex is incorrect.

Community
  • 1
  • 1
1

The way I do it is to do a Topological Sort, counting the number of vertices visited. If that number is less than the total number of vertices in the DAG, you have a cycle.

Steve
  • 6,382
  • 3
  • 41
  • 66
  • 5
    That does not make sense. If the graph has cycles, there is no topological sorting, which means any correct algorithm for topological sorting will abort. – sleske Sep 30 '09 at 15:47
  • 5
    from wikipedia: *Many topological sorting algorithms will detect cycles too, since those are obstacles for topological order to exist.* – Oleg Mikheev Jan 20 '13 at 01:59
  • 1
    @OlegMikheev Yes, but Steve is saying " If that number is less than the total number of vertices in the DAG, you have a cycle", which does not make sense. – nbro Jul 28 '15 at 22:24
  • @nbro I'd bet, they mean a variant of topological sorting algorithm which aborts when no topological sorting exists (and then they don't visit all vertices). – maaartinus Sep 11 '17 at 20:25
  • If you do a topological sorting on a graph with cycle you will end up with an order that has the least amount of bad edges(order number > order number of neighbour). But after you have to the sorting its easy to detect those bad edges resulting in detecting a graph with a cycle – Plagon Feb 04 '18 at 00:02
0

As you said, you have set of jobs, it need to be executed in certain order. Topological sort given you required order for scheduling of jobs(or for dependency problems if it is a direct acyclic graph). Run dfs and maintain a list, and start adding node in the beginning of the list, and if you encountered a node which is already visited. Then you found a cycle in given graph.

Bhagwati Malav
  • 3,349
  • 2
  • 20
  • 33
-1

If DFS finds an edge that points to an already-visited vertex, you have a cycle there.

nbro
  • 15,395
  • 32
  • 113
  • 196
mafonya
  • 2,152
  • 22
  • 21
  • 2
    Fails on 1,2,3: 1,2; 1,3; 2,3; – noisy cat Feb 04 '14 at 09:19
  • @kittyPL can you explain why that case fails? Either 1) That is a directed graph and so no cycle has been formed or 2) DFS would go 1 -> 2 (using 1,2), 2 -> 3 (using 2,3), 3 -> 1 (using 1,3) – Jake Greene Feb 05 '14 at 01:24
  • 4
    @JakeGreene Look here: i.imgur.com/tEkM5xy.png Simple enough to understand. Lets say you start from 0. Then you go to the node 1, no more paths from there, reucrsion goes back. Now you visit node 2, which has a edge to the vertex 1, which was visited already. In your opinion you would have a cycle then - and you do not have one really – noisy cat Feb 10 '14 at 01:25
  • 3
    @kittyPL That graph does not contain a cycle. From Wikipedia: "A directed cycle in a directed graph is a sequence of vertices starting and ending at the same vertex such that, for each two consecutive vertices of the cycle, there exists an edge directed from the earlier vertex to the later one" You have to be able to follow a path from V that leads back to V for a directed cycle. mafonya's solution works for the given problem – Jake Greene Feb 10 '14 at 06:00
  • 2
    @JakeGreene Of course it does not. Using your algorithm and starting from 1 you would detect a cycle anyway... This algorithm is just bad... Usually it would be sufficient to walk backwards whenever you encounter a visited vertex. – noisy cat Feb 10 '14 at 14:10
  • 6
    @kittyPL DFS does work to detect cycles from the given starting node. But when doing DFS you must color visited nodes to distinguish a cross-edge from back-edge. First time visiting a vertex it turns grey, then you turn it black once all its edges have been visited. If when doing the DFS you hit a grey vertex then that vertex is an ancestor (ie: you have a cycle). If the vertex is black then it's just a cross-edge. – Kyrra Dec 25 '14 at 04:29
  • @kittyPLm it's not a bad algorithm if you keep track of the nodes that are on the recursive call stack. – Pavel Sep 18 '15 at 15:07
-13

If a graph satisfy this property

|e| > |v| - 1

then the graph contains at least on cycle.

nbro
  • 15,395
  • 32
  • 113
  • 196