Getting the connected components in a graph

Question

I am trying to get all the connected components in a graph and print them out. I am going through each node of the graph and performing a depth-first-search(DFS) starting from that node. Here is my code:

graph = {
'a': ['b'],
'b': ['c'],
'c': ['d'],
'd': [],
'e': ['f'],
'f': []
}

def DFS(graph, start_node, stack = [], visited = []):
    stack.append(start_node)

    while stack:
        v = stack.pop()
        if v not in visited:
            visited.append(v)
            for neighbor in graph[v]:
                stack.append(neighbor)
    return visited



def dfs_get_connected_components_util(graph):
    visited = []

    for node in graph:
        if node not in visited:
            DFS_algo = DFS(graph, node)
            print(DFS_algo)
            visited = DFS_algo

print(dfs_get_connected_components_util(graph))

According to my graph, there are two connected components, a -> b -> c -> d and e -> f

Instead I get the following print out:

['c', 'd']
['c', 'd', 'a', 'b']
['c', 'd', 'a', 'b', 'f']
['c', 'd', 'a', 'b', 'f', 'e']

I can't seem to figure out what I'm doing wrong in my connected components function. I guess this may be more of a python question.

I think this relates to the issue of [mutable default arguments](https://pythonconquerstheuniverse.wordpress.com/2012/02/15/mutable-default-arguments/) — Asish M., Nov 16 '16 at 12:02
That's only one issue... but also this problem isn't defined well. If you start walking the graph from 'c', you'll never find 'a' as a connected node. So you should define exactly what you are asking. — Guy, Nov 16 '16 at 12:04
Possible duplicate of ["Least Astonishment" and the Mutable Default Argument](http://stackoverflow.com/questions/1132941/least-astonishment-and-the-mutable-default-argument) — Łukasz Rogalski, Nov 16 '16 at 12:04
Undirected graphs have connected components. Directed graphs have strongly connected components. Both are equivalence relations. I believe your definitions are wrong, but this is unrelated to the python specific coding issue, which has been answered. — Kenny Ostrom, Nov 16 '16 at 22:17
Thanks @KennyOstrom. I have added a clarification about that. I assumed the graph is *undirected*. My solution doesn't find strongly connected components and isn't valid if the graph is directed. — Guy, Nov 17 '16 at 06:08
That makes sense, but if the graph is undirected, your adjacency list should show both halves of each edge. {'a': ['b'], 'b': ['a']}. Just be sure to insert both when you add an edge. — Kenny Ostrom, Nov 17 '16 at 15:21

Guy · Answer 1 · 2016-11-17T06:07:47.837

This is what I've come up with. I added some comments inline to explain what I did. Some stuff were moved to be global, just for clarity. I wouldn't usually recommend using global variable.

The key is to understand the recursion, and also remember that when doing assignment of an object (that's not a literal), you only assign the reference and not a copy of it.

Please note that this solution assumed the graph is undirected. See more details why at the notes section below.

Feel free to ask for clarifications.

from collections import defaultdict

graph = {
    'a': ['b'],
    'b': ['c'],
    'c': ['d'],
    'd': [],
    'e': ['f'],
    'f': []
}

connected_components = defaultdict(set)


def dfs(node):
    """
    The key is understanding the recursion
    The recursive assumption is:
        After calling `dfs(node)`, the `connected_components` dict contains all the connected as keys,
        and the values are *the same set* that contains all the connected nodes.
    """
    global connected_components, graph
    if node not in connected_components:
        # this is important, so neighbors won't try to traverse current node
        connected_components[node] = set()
        for next_ in graph[node]:
            dfs(next_)
            # according the recursive assumption, connected_component of `next_` is also the one of `node`
            connected_components[node] = connected_components[next_]

        # all that's left is add the current node
        connected_components[node].add(node)

for node_ in graph:
    dfs(node_)


# get all connected components and convert to tuples, so they are hashable
connected_comp_as_tuples = map(tuple, connected_components.values())

# use ``set`` to make the list of connected components distinct (without repetition)
unique_components = set(connected_comp_as_tuples)
print(unique_components)

Notes

This wasn't thoroughly tested of course... you should try with different graphs (with loops, single node components etc.)
The code could probably be improved (in terms of performance and maybe clarity). For instance, we create a set for every node, even if we don't really need one (when the node has neighbors, the set is redundant and will be overwritten).
In the original code of the OP he used mutable default arguments. That's a big NO NO (unless you really REALLY know what you are doing) and, as commented above, might have caused the issues. But not this time...
Considering @kenny-ostroms comment on the question, a word on definition (which isn't related to Python): connected components is related to undirected graphs only. For directed graphs, the term is strongly connected components. The notion is the same - for each 2 nodes in such a component (directed or undirected), there's a path between these 2 nodes. So even if node 'b' is reachable from 'a', if 'a' isn't reachable from 'b' (which could happen in directed graphs only), 'a' and 'b' will not share a connected component. My solution is not valid for a directed graph. The solution assumes the graph can be treated as undirected (in other words, if 'b' is 'a's neighbor, we assume that 'a' is 'b's neighbor).

Getting the connected components in a graph

1 Answers1

Notes