Iterative DFS itself is not complicated, as seen from Wikipedia. However, calculating the finish time of each node requires some tweaks to the algorithm. We only pop the node off the stack the 2nd time we encounter it.
Here's my implementation which I feel demonstrates what's going on a bit more clearly:
step = 0 # time counter
def dfs_visit(g, v):
"""Run iterative DFS from node V"""
global step
total = 0
stack = [v] # create stack with starting vertex
while stack: # while stack is not empty
step += 1
v = stack[-1] # peek top of stack
if v.color: # if already seen
v = stack.pop() # done with this node, pop it from stack
if v.color == 1: # if GRAY, finish this node
v.time_finish = step
v.color = 2 # BLACK, done
else: # seen for first time
v.color = 1 # GRAY: discovered
v.time_discover = step
total += 1
for w in v.child: # for all neighbor (v, w)
if not w.color: # if not seen
stack.append(w)
return total
def dfs(g):
"""Run DFS on graph"""
global step
step = 0 # reset step counter
for k, v in g.nodes.items():
if not v.color:
dfs_visit(g, v)
I am following the conventions of the CLR Algorithm Book and use node coloring to designate its state during the DFS search. I feel this is easier to understand than using a separate list to track node state.
All nodes start out as white. When it's discovered during the search it is marked as gray. When we are done with it, it is marked as black.
Within the while loop, if a node is white we keep it in the stack, and change its color to gray. If it's gray we change its color to black, and set its finish time. If it's black we just ignore it.
It is possible for a node on the stack to be black (even with our coloring check before adding it to the stack). A white node can be added to the stack twice (via two different neighbors). One will eventually turn black. When we reach the 2nd instance on the stack, we need to make sure we don't change its already set finish time.
Here are some additional support codes:
class Node(object):
def __init__(self, name=None):
self.name = name
self.child = [] # children | adjacency list
self.color = 0 # 0: white [unvisited], 1: gray [found], 2: black [finished]
self.time_discover = None # DFS
self.time_finish = None # DFS
class Graph(object):
def __init__(self):
self.nodes = defaultdict(Node) # list of Nodes
self.max_heap = [] # nodes in decreasing finish time for SCC
def build_max_heap(self):
"""Build list of nodes in max heap using DFS finish time"""
for k, v in self.nodes.items():
self.max_heap.append((0-v.time_finish, v)) # invert finish time for max heap
heapq.heapify(self.max_heap)
To run DFS on the reverse graph, you can build a parent list similar to the child list for each Node when the edges file is processed, and use the parent list instead of the child list in dfs_visit().
To process Nodes in decreasing finish time for the last part of SCC computation, you can build a max heap of Nodes, and use that max heap in dfs_visit() instead of simply the child list.
while g.max_heap:
v = heapq.heappop(g.max_heap)[1]
if not v.color:
size = dfs_visit(g, v)
scc_size.append(size)