kosaraju finding finishing time using iterative dfs

Question

here is the first part of the code that i have did for Kosaraju's algorithm.

###### reading the data #####
with open('data.txt') as req_file:
        ori_data = []
        for line in req_file:
            line = line.split()
            if line:
                line = [int(i) for i in line]
                ori_data.append(line)

###### forming the Grev ####
revscc_dic = {}
for temp in ori_data:
    if temp[1] not in revscc_dic:
        revscc_dic[temp[1]] = [temp[0]]
    else:
        revscc_dic[temp[1]].append(temp[0])

print revscc_dic        

######## finding the G#####
scc_dic = {}
for temp in ori_data:
    if temp[0] not in scc_dic:
        scc_dic[temp[0]] = [temp[1]]
    else:
        scc_dic[temp[0]].append(temp[1])

print scc_dic        

##### iterative dfs ####
path = []
for i in range(max(max(ori_data)),0,-1):
    start = i
    q=[start]
    while q:
        v=q.pop(0)
        if v not in path:
          path.append(v)
          q=revscc_dic[v]+q
print path

The code reads the data and forms Grev and G correctly. I have written a code for iterative dfs. How can i include to find the finishing time ?? I understand finding the finishing time using paper and pen but I do not understand the part of finishing time as a code ?? how can I implement it.. Only after this I can proceed my next part of code. Pls help. Thanks in advance.

The data.txt file contains:

please save it as data.txt.

The idea of Kosaraju’s algorithm is like this: 1. Compute the finishing time of all the vertices by doing a DFS on the reversed graph; 2. Replace the vertex index with its finishing time to get a new graph, DFS the new graph to compute each vertex’s leader vertex( in a strongly connected component(SCC) if exists any); 3. Do statistical job on the leader vertices’ index. If multiple vertices have same leader vertex, they are in the same SCC. — user 3317704, Jun 05 '14 at 07:03
Keep a counter initialized to N. While performing the DFS on reversed graph, as soon as you visit a vertex, mark the topological order (finishing time) of that vertex as the counter value and decrement the counter. — Abhishek Bansal, Jun 05 '14 at 07:16

James Lawson · Accepted Answer · 2014-11-06T15:04:54.067

With recursive dfs, it is easy to see when a given vertex has "finished" (i.e. when we have visited all of its children in the dfs tree). The finish time can be calculated just after the recursive call has returned.
However with iterative dfs, this is not so easy. Now that we are iteratively processing the queue using a while loop we have lost some of the nested structure that is associated with function calls. Or more precisely, we don't know when backtracking occurs. Unfortunately, there is no way to know when backtracking occurs without adding some additional information to our stack of vertices.

The quickest way to add finishing times to your dfs implementation is like so:

##### iterative dfs (with finish times) ####
path = []
time = 0
finish_time_dic = {}
for i in range(max(max(ori_data)),0,-1):
    start = i
    q = [start]
    while q:
        v = q.pop(0)
        if v not in path:
            path.append(v)
            q = [v] + q
            for w in revscc_dic[v]:
                if w not in path: q = [w] + q
        else:
            if v not in finish_time_dic:
                finish_time_dic[v] = time
                time += 1
print path  
print finish_time_dic

The trick used here is that when we pop off v from the stack, if it is the first time we have seen it, then we add it back to the stack again. This is done using: q = [v] + q. We must push v onto the stack before we push on its neighbours (we write the code that pushes v before the for loop that pushes v's neighbours) - or else the trick doesn't work. Eventually we will pop v off the stack again. At this point, v has finished! We have seen v before, so, we go into the else case and compute a fresh finish time.

For the graph provided, finish_time_dic gives the correct finishing times:

{1: 6, 2: 1, 3: 3, 4: 7, 5: 0, 6: 4, 7: 8, 8: 2, 9: 5}

Note that this dfs algorithm (with the finishing times modification) still has O(V+E) complexity, despite the fact that we are pushing each node of the graph onto the stack twice. However, more elegant solutions exist. I recommend reading Chapter 5 of Python Algorithms: Mastering Basic Algorithms in the Python Language by Magnus Lie Hetland (ISBN: 1430232374, 9781430232377). Question 5-6 and 5-7 (on page 122) describe your problem exactly. The author answers these questions and gives an alternate solution to the problem.

Questions:

5-6 In recursive DFS, backtracking occurs when you return from one of the recursive calls. But where has the backtracking gone in the iterative version?

5-7. Write a nonrecursive version of DFS that can deal determine finish-times.

Answers:

5-6 It’s not really represented at all in the iterative version. It just implicitly occurs once you’ve popped off all your “traversal descendants” from the stack.

5-7 As explained in Exercise 5-6, there is no point in the code where backtracking occurs in the iterative DFS, so we can’t just set the finish time at some specific place (like in the recursive one). Instead, we’d need to add a marker to the stack. For example, instead of adding the neighbors of u to the stack, we could add edges of the form (u, v), and before all of them, we’d push (u, None), indicating the backtracking point for u.

Minor improvement: before line `for w in revscc_dic[v]` add `if (v in revscc_dic):`, otherwise it will break for vertices with no outgoing edges. — Maria Ines Parnisari, Oct 30 '16 at 22:30

score 4 · Answer 2 · answered Mar 11 '18 at 23:11

Iterative DFS itself is not complicated, as seen from Wikipedia. However, calculating the finish time of each node requires some tweaks to the algorithm. We only pop the node off the stack the 2nd time we encounter it.

Here's my implementation which I feel demonstrates what's going on a bit more clearly:

step = 0  # time counter

def dfs_visit(g, v):
    """Run iterative DFS from node V"""
    global step
    total = 0
    stack = [v]  # create stack with starting vertex
    while stack:  # while stack is not empty
        step += 1
        v = stack[-1]  # peek top of stack
        if v.color:  # if already seen
            v = stack.pop()  # done with this node, pop it from stack
            if v.color == 1:  # if GRAY, finish this node
                v.time_finish = step
                v.color = 2  # BLACK, done
        else:  # seen for first time
            v.color = 1  # GRAY: discovered
            v.time_discover = step
            total += 1
            for w in v.child:  # for all neighbor (v, w)
                if not w.color:  # if not seen
                    stack.append(w)
    return total

def dfs(g):
    """Run DFS on graph"""
    global step
    step = 0  # reset step counter
    for k, v in g.nodes.items():
        if not v.color:
            dfs_visit(g, v)

I am following the conventions of the CLR Algorithm Book and use node coloring to designate its state during the DFS search. I feel this is easier to understand than using a separate list to track node state.

All nodes start out as white. When it's discovered during the search it is marked as gray. When we are done with it, it is marked as black.

Within the while loop, if a node is white we keep it in the stack, and change its color to gray. If it's gray we change its color to black, and set its finish time. If it's black we just ignore it.

It is possible for a node on the stack to be black (even with our coloring check before adding it to the stack). A white node can be added to the stack twice (via two different neighbors). One will eventually turn black. When we reach the 2nd instance on the stack, we need to make sure we don't change its already set finish time.

Here are some additional support codes:

class Node(object):
    def __init__(self, name=None):
        self.name = name
        self.child = []  # children | adjacency list
        self.color = 0  # 0: white [unvisited], 1: gray [found], 2: black [finished]
        self.time_discover = None  # DFS
        self.time_finish = None  # DFS

class Graph(object):
    def __init__(self):
        self.nodes = defaultdict(Node)  # list of Nodes
        self.max_heap = []  # nodes in decreasing finish time for SCC

    def build_max_heap(self):
        """Build list of nodes in max heap using DFS finish time"""
        for k, v in self.nodes.items():
            self.max_heap.append((0-v.time_finish, v))  # invert finish time for max heap
        heapq.heapify(self.max_heap)

To run DFS on the reverse graph, you can build a parent list similar to the child list for each Node when the edges file is processed, and use the parent list instead of the child list in dfs_visit().

To process Nodes in decreasing finish time for the last part of SCC computation, you can build a max heap of Nodes, and use that max heap in dfs_visit() instead of simply the child list.

    while g.max_heap:
        v = heapq.heappop(g.max_heap)[1]
        if not v.color:
           size = dfs_visit(g, v)
           scc_size.append(size)

score 2 · Answer 3 · answered Sep 20 '16 at 14:49

I had a few issues with the order produced by Lawson's version of the iterative DFS. Here is code for my version which has a 1-to-1 mapping with a recursive version of DFS.

n = len(graph)
time = 0
finish_times = [0] * (n + 1)
explored = [False] * (n + 1)

# Determine if every vertex connected to v
# has already been explored
def all_explored(G, v):
    if v in G:
        for w in G[v]:
            if not explored[w]:
                return False
    return True

# Loop through vertices in reverse order
for v in xrange(n, 0, -1):
    if not explored[v]:
        stack = [v]
        while stack:
            print(stack)
            v = stack[-1]
            explored[v] = True

            # If v still has outgoing edges to explore
            if not all_explored(graph_reversed, v):
                for w in graph_reversed[v]:

                    # Explore w before others attached to v
                    if not explored[w]:
                        stack.append(w)
                        break

            # We have explored vertices findable from v
            else:
                stack.pop()
                time += 1
                finish_times[v] = time

score 1 · Answer 4 · answered Mar 24 '21 at 12:41

Here are the recursive and iterative implementations in java:

int time = 0;
public void dfsRecursive(Vertex vertex) {
        time += 1;
        vertex.setVisited(true);
        vertex.setDiscovered(time);
        for (String neighbour : vertex.getNeighbours()) {
            if (!vertices.get(neighbour).getVisited()) {
                dfsRecursive(vertices.get(neighbour));
            }
        }
        time += 1;
        vertex.setFinished(time);
    }

    public void dfsIterative(Vertex vertex) {
        Stack<Vertex> stack = new Stack<>();
        stack.push(vertex);
        while (!stack.isEmpty()) {
            Vertex current = stack.pop();
            if (!current.getVisited()) {
                time += 1;
                current.setVisited(true);
                current.setDiscovered(time);
                stack.push(current);
                List<String> currentsNeigbours = current.getNeighbours();
                for (int i = currentsNeigbours.size() - 1; i >= 0; i--) {
                    String currentNeigbour = currentsNeigbours.get(i);
                    Vertex neighBour = vertices.get(currentNeigbour);
                    if (!neighBour.getVisited())
                        stack.push(neighBour);
                }
            } else {
                if (current.getFinished() < 1) {
                    time += 1;
                    current.setFinished(time);
                }
            }
        }
    }

score -1 · Answer 5 · answered May 19 '16 at 17:43

First, you should know exactly what is finished time. In recursive dfs, finished time is when all of the adjacent nodes [V]s of a Node v is finished, with this keeping in mind you need to have additional data structure to store all infos.

adj[][]  //graph
visited[]=NULL //array of visited node
finished[]=NULL //array of finished node
Stack st=new Stack  //normal stack 
Stack backtrack=new Stack //additional stack
function getFinishedTime(){
for(node i in adj){
     if (!vistied.contains[i]){
         st.push(i);
         visited.add(i)
         while(!st.isEmpty){
              int j=st.pop();
              int[] unvisitedChild= getUnvistedChild(j);
              if(unvisitedChild!=null){
                   for(int c in unvisitedChild){
                        st.push(c);
                        visited.add(c);
                    }
                    backtrack.push([j,unvisitedChild]); //you can store each entry as array with the first index as the parent node j, followed by all the unvisited child node.
              }
              else{ 
                   finished.add(j);
                   while(!backtrack.isEmpty&&finished.containsALL(backtrack.peek())) //all of the child node is finished, then we can set the parent node visited
                   {
                   parent=backtrack.pop()[0];
                   finished.add(parent);
                   }
              }
        }
    }
}

 function getUnvistedChild(int i){
     unvisitedChild[]=null
     for(int child in adj[i]){
        if(!visited.contains(child))
            unvisitedChild.add(child);
     }
     return unvisitedChild;
 }

and the finished time should be [5, 2, 8, 3, 6, 9, 1, 4, 7]

kosaraju finding finishing time using iterative dfs

5 Answers5