BFS, Iterative DFS, and Recursive DFS: When to Mark Node as Visited

Question

After many hours spent googling, I still have not come across an in depth, intuitive as well as solidly proven treatment of this question. The closest article I have found, linked to on some obscure discussion forum, is this: https://11011110.github.io/blog/2013/12/17/stack-based-graph-traversal.html. I have also seen this Stack Overflow question DFS vs BFS .2 differences, but the responses do not arrive at a clear consensus.

So here is the question: I have seen it stated (in Wikipedia, as well as Algorithms Illuminated by Tim Roughgarden) that, to transform a BFS implementation into an iterative DFS one, the following two changes are made:

The non-recursive implementation is similar to breadth-first search but differs from it in two ways: it uses a stack instead of a queue, and it delays checking whether a vertex has been discovered until the vertex is popped from the stack rather than making this check before adding the vertex.

Can anyone help explain, via intuition or example, the reason for the second distinction here? Specifically: what is the differentiating factor between BFS, iterative DFS, and recursive DFS that necessitates postponing the check until after popping off the stack only for iterative DFS?

Here is a basic implementation of BFS:

    def bfs(adjacency_list, source):
        explored = [False] * len(adjacency_list)
        queue = deque()
        queue.append(source)
        explored[source] = True
        while queue:
            node = queue.popleft()
            print(node)
            for n in adjacency_list[node]:
                if explored[n] == False:
                    explored[n] = True
                    queue.append(n)

If we simply swap the queue for a stack, we get this implementation of DFS:

    def dfs_stack_only(adjacency_list, source):
        explored = [False] * len(adjacency_list)
        stack = deque()
        stack.append(source)
        explored[source] = True
        while stack:
            node = stack.pop()
            print(node)
            for n in adjacency_list[node]:
                if explored[n] == False:
                    explored[n] = True
                    stack.append(n)

The only difference between these two algorithms here is that we swapped the queue from BFS for a stack in DFS. This implementation of DFS actually produces incorrect traversals (in a non simplistic graph; possibly for a very simple graph it might anyway produce a correct traversal).

I believe that this is the 'error' referenced in the article linked above.

However, this can be fixed in one of two ways.

Either of these two implementations produces a correct traversal:

First, the implementation suggested in the sources above, with the check delayed until after popping the node from the stack. This implementation results in many duplicates on the stack.

    def dfs_iterative_correct(adjacency_list, source):
        explored = [False] * len(adjacency_list)
        stack = deque()
        stack.append(source)
        while stack:
            node = stack.pop()
            if explored[node] == False:
                explored[node] = True
                print(node)
                for n in adjacency_list[node]:
                    stack.append(n)

Alternatively, this is a popular online implementation (this one taken from Geeks for Geeks) which also produces the correct traversal. There are some duplicates on the stack, but hardly as many as the previous implementation.

def dfs_geeks_for_geeks(adjacency_list, source):
    explored = [False] * len(adjacency_list)
    stack = deque()
    stack.append(source)

    while len(stack):
        node = stack.pop()
        if not explored[node]:
            explored[node] = True
            print(node)

        for n in adjacency_list[node]:
            if not explored[n]:
                stack.append(n)

So in summary, it seems that the difference is not solely about when you check the visited status of a node, but more about when you actually mark it as visited. Furthermore, why does marking it as visited immediately work just fine for BFS, but not for DFS? Any insight is greatly appreciated!

Thank you!

score 1 · Answer 1 · answered Jan 24 '22 at 15:38

1

I don't see a difference in that respect between BFS and DFS.
I see two requirements to "marking nodes as visited":

It should not prevent pushing nodes neighbors into the stack or queue.
It should prevent pushing the node again into the stack or queue.

Those requirements apply to DFS as well as BFS, so the squance for both can be:

fetch node from stack or queue
mark node as visited
get node's neighbors
put any unvisited neighbor into the stack or queue

answered Jan 24 '22 at 15:38

c0der

18,467
6
33
65

Thanks for your response. The sources mentioned specify that the order ought to be changed. Are you saying that isn't necessarily so? – mker Jan 24 '22 at 16:21
Yes. That is what I mean. You can change iterative DFS to BFS (or back) by changing only the pull from stack to queue. – c0der Jan 24 '22 at 16:42
Thanks. So I guess the distinction is not so much about when we actually check whether a node has been visited, and more about when we actually mark as visited. I'm editing the question to supply some more code - can you take a look? – mker Jan 24 '22 at 19:24

BFS, Iterative DFS, and Recursive DFS: When to Mark Node as Visited

1 Answers1

Linked