1

Let's say I implemented a simple version of an iterative DFS like this:

import sys
import traceback


def dfs(graph, start):
    visited, stack = [], [start]
    while stack:
        node = stack.pop()

        if node not in visited:
            visited.append(node)
            childs = reversed(graph.get(node, list()))
            stack.extend([item for item in childs if item not in visited])

    return visited

if __name__ == "__main__":
    graphs = [
        {
            'A': ['B', 'C'],
            'B': ['D']
        }
    ]

    for i, g in enumerate(graphs):
        try:
            print "{0}Graph {1}{2}".format('-' * 40, i, '-' * 33)

            for f in [dfs]:
                print f.__name__, '-->', f(g, 'A')
                print '-' * 80
        except Exception as e:
            print "Exception in user code: {0}".format(e)
            print '-' * 60
            traceback.print_exc(file=sys.stdout)
            print '-' * 60

The output of the above snippet is this:

----------------------------------------Graph 0---------------------------------
dfs --> ['A', 'B', 'D', 'C']
--------------------------------------------------------------------------------

Now, I'm trying to figure out how to get the following output (instead running node's method just printing is fine):

A_start, B_start, D_start, D_end, B_end, A_middle, C_start, C_end, A_end

*_middle will only be executed between subnodes execution. For instance, if a node doesn't have any subnodes, or has only a single one, it never gets executed. That's why my desired output only has A_middle (none of the B_middle, C_middle, D_middle) in the above example.

How can I do this?

EDIT:

Trying to find the recursive solution to my problem:

def dfs(graph, node):
    if node not in graph:
        return

    print '{0}_start'.format(node)

    for i, node in enumerate(graph[node]):
        if i > 0:
            print '{0}_middle'.format(node)

        dfs(graph, node)

    print '{0}_end'.format(node)

if __name__ == "__main__":
    graphs = [
        {
            'A': ['B', 'C'],
            'B': ['D']
        }
    ]

    for i, g in enumerate(graphs):
        try:
            print "{0}Graph {1}{2}".format('-' * 40, i, '-' * 33)

            for f in [dfs]:
                print f.__name__, '-->'
                f(g, 'A')
                print '-' * 80
        except Exception as e:
            print "Exception in user code: {0}".format(e)
            print '-' * 60
            traceback.print_exc(file=sys.stdout)
            print '-' * 60

Will give me the wrong output:

----------------------------------------Graph 0---------------------------------
dfs -->
A_start
B_start
D_end
C_middle
C_end
--------------------------------------------------------------------------------
halfer
  • 19,824
  • 17
  • 99
  • 186
BPL
  • 9,632
  • 9
  • 59
  • 117
  • 1
    Your iterative implementation is, in fact, forever done with any node immediately after reading its children. A recursive implementation would make it much easier to get the output you want. Otherwise you would need another data structure to keep track of when a node's "children which actually got added to the stack while reading that node" are processed. – Kenny Ostrom Sep 09 '16 at 18:34

3 Answers3

2

You were really close with your recursive attempt actually.
I added comments for the small tweaks I made.

import sys, traceback

def dfs(graph, node):
    print '{0}_start'.format(node)  # need this right at the top
    if node not in graph:
        print '{0}_end'.format(node)  # need to record the end if we can't find
        return

    for i, nd in enumerate(graph[node]):  # need a different `node` variable here!!!
        if i > 0:
            print '{0}_middle'.format(node)

        dfs(graph, nd)

    print '{0}_end'.format(node)

if __name__ == "__main__":
    graphs = [
        {
            'A': ['B', 'C'],
            'B': ['D']
        }
    ]

    for i, g in enumerate(graphs):
        try:
            print "{0}Graph {1}{2}".format('-' * 40, i, '-' * 33)

            for f in [dfs]:
                print f.__name__, '-->'
                f(g, 'A')
                print '-' * 80
        except Exception as e:
            print "Exception in user code: {0}".format(e)
            print '-' * 60
            traceback.print_exc(file=sys.stdout)
            print '-' * 60

This produces the output you were looking for.

Gerrat
  • 28,863
  • 9
  • 73
  • 101
2

As the other answers have shown, the main issue with your current recursive code is the base case:

if node not in graph:
    return

This incorrectly skips the output when there are no children from a node. Get rid of those lines and, just use enumerate(graph.get(start, [])) instead of enumerate(graph[start]) in the for loop and it should work as desired.

Making your iterative code work is quite a bit more complicated. One way of attempting it would be to push 2-tuples to the stack. The first value would a node, as before, but the second will be either a predecessor of the node (so we can print a middle message for the parent), or None indicating that we need to print the end marker for the node.

However, keeping track of which nodes have been visited gets a bit more complicated. Rather than a single list of nodes, I'm using a dictionary mapping from node to an integer. A non-existent value means the node has not yet been visited. A one means the node has been visited and it's start message has been printed. A 2 means that at least one of the node's children has been visited, and each further child should print a middle message on the parent's behalf. A 3 means the end message has been printed.

def dfs(graph, start):
    visited = {}
    stack = [(start, "XXX_THIS_NODE_DOES_NOT_EXIST_XXX")]
    while stack:
        node, parent = stack.pop()
        if parent is None:
            if visited[node] < 3:
                print "{}_end".format(node)
            visited[node] = 3

        elif node not in visited:
            if visited.get(parent) == 2:
                print "{}_middle".format(parent)
            elif visited.get(parent) == 1:
                visited[parent] = 2

            print "{}_start".format(node)
            visited[node] = 1
            stack.append((node, None))
            for child in reversed(graph.get(node, [])):
                if child not in visited:
                    stack.append((child, node))

Because I'm using an dictionary for visited, returning it at the end is probably not appropriate, so I've removed the return statement. I think you could restore it if you really wanted to by using a collections.OrderedDict rather than a normal dict, and returning its keys().

Blckknght
  • 100,903
  • 11
  • 120
  • 169
  • Thanks, it's perfect. It solves the main problem I got designing this algorithm as an iterative version. I've added some comments in my question related to my validation answer decision – BPL Sep 10 '16 at 11:46
  • Another [question](http://stackoverflow.com/questions/39427638/traversing-subgraphs-multiple-times-dfs-algorithm) related to this one – BPL Sep 10 '16 at 18:25
1

I suspect this does what you want.

graphs = [
        {
            'A': ['B', 'C'],
            'B': ['D']
        }
    ]

def dfs(graph, start):
    print '{}_start'.format(start)
    try:
        for child in graph[start]:
            dfs(graph, child)
            print '{}_middle'.format(start)
    except KeyError:
        # We found a leaf node, it has no children.
        pass
    print '{}_end'.format(start)   

# Test it for one graph
dfs(graphs[0], 'A')

# Output:

# A_start
# B_start
# D_start
# D_end
# B_middle
# B_end
# A_middle
# C_start
# C_end
# A_middle
# A_end
ffledgling
  • 11,502
  • 8
  • 47
  • 69
  • There was an error in the indentation, please look at the edit. Also, if this answer does what you want, remember to mark it as accepted! – ffledgling Sep 09 '16 at 22:07
  • To meet the stated requirements, you would need to print the middle message only between different children of the same node. – Kenny Ostrom Sep 09 '16 at 22:17
  • @KennyOstrom and you're saying this doesn't do that? The only edge case I see is when something has a single child. If you want it to explicitly print *between* two sibling nodes, then there needs to be an additional condition within the `for` loop iterating over the children. – ffledgling Sep 09 '16 at 22:23