After many hours spent googling, I still have not come across an in depth, intuitive as well as solidly proven treatment of this question. The closest article I have found, linked to on some obscure discussion forum, is this: https://11011110.github.io/blog/2013/12/17/stack-based-graph-traversal.html. I have also seen this Stack Overflow question DFS vs BFS .2 differences, but the responses do not arrive at a clear consensus.
So here is the question: I have seen it stated (in Wikipedia, as well as Algorithms Illuminated by Tim Roughgarden) that, to transform a BFS implementation into an iterative DFS one, the following two changes are made:
The non-recursive implementation is similar to breadth-first search but differs from it in two ways: it uses a stack instead of a queue, and it delays checking whether a vertex has been discovered until the vertex is popped from the stack rather than making this check before adding the vertex.
Can anyone help explain, via intuition or example, the reason for the second distinction here? Specifically: what is the differentiating factor between BFS, iterative DFS, and recursive DFS that necessitates postponing the check until after popping off the stack only for iterative DFS?
Here is a basic implementation of BFS:
def bfs(adjacency_list, source):
explored = [False] * len(adjacency_list)
queue = deque()
queue.append(source)
explored[source] = True
while queue:
node = queue.popleft()
print(node)
for n in adjacency_list[node]:
if explored[n] == False:
explored[n] = True
queue.append(n)
If we simply swap the queue for a stack, we get this implementation of DFS:
def dfs_stack_only(adjacency_list, source):
explored = [False] * len(adjacency_list)
stack = deque()
stack.append(source)
explored[source] = True
while stack:
node = stack.pop()
print(node)
for n in adjacency_list[node]:
if explored[n] == False:
explored[n] = True
stack.append(n)
The only difference between these two algorithms here is that we swapped the queue from BFS for a stack in DFS. This implementation of DFS actually produces incorrect traversals (in a non simplistic graph; possibly for a very simple graph it might anyway produce a correct traversal).
I believe that this is the 'error' referenced in the article linked above.
However, this can be fixed in one of two ways.
Either of these two implementations produces a correct traversal:
First, the implementation suggested in the sources above, with the check delayed until after popping the node from the stack. This implementation results in many duplicates on the stack.
def dfs_iterative_correct(adjacency_list, source):
explored = [False] * len(adjacency_list)
stack = deque()
stack.append(source)
while stack:
node = stack.pop()
if explored[node] == False:
explored[node] = True
print(node)
for n in adjacency_list[node]:
stack.append(n)
Alternatively, this is a popular online implementation (this one taken from Geeks for Geeks) which also produces the correct traversal. There are some duplicates on the stack, but hardly as many as the previous implementation.
def dfs_geeks_for_geeks(adjacency_list, source):
explored = [False] * len(adjacency_list)
stack = deque()
stack.append(source)
while len(stack):
node = stack.pop()
if not explored[node]:
explored[node] = True
print(node)
for n in adjacency_list[node]:
if not explored[n]:
stack.append(n)
So in summary, it seems that the difference is not solely about when you check the visited status of a node, but more about when you actually mark it as visited. Furthermore, why does marking it as visited immediately work just fine for BFS, but not for DFS? Any insight is greatly appreciated!
Thank you!