154

How do you trace the path of a Breadth-First Search, such that in the following example:

If searching for key 11, return the shortest list connecting 1 to 11.

[1, 4, 7, 11]
Community
  • 1
  • 1
Christopher Markieta
  • 5,674
  • 10
  • 43
  • 60
  • 7
    It was actually an old assignment I was helping a friend on months ago, based on the Kevin Bacon Law. My final solution was very sloppy, I basically did another Breadth-first search to "rewind" and backtrack. I wan't to find a better solution. – Christopher Markieta Jan 19 '12 at 06:58
  • 26
    Excellent. I consider revisiting an old problem in an attempt to find a better answer to be an admirable trait in an engineer. I wish you well in your studies and career. – Peter Rowell Jan 19 '12 at 17:14
  • 3
    Thanks for the praise, I just believe if I don't learn it now, I will be faced with the same problem again. – Christopher Markieta Jan 20 '12 at 06:23
  • possible duplicate of [How to get the path between 2 nodes using Breadth-First Search?](http://stackoverflow.com/questions/5149962/how-to-get-the-path-between-2-nodes-using-breadth-first-search) – nbro Jul 24 '15 at 15:27

7 Answers7

271

You should have look at http://en.wikipedia.org/wiki/Breadth-first_search first.


Below is a quick implementation, in which I used a list of list to represent the queue of paths.

# graph is in adjacent list representation
graph = {
        '1': ['2', '3', '4'],
        '2': ['5', '6'],
        '5': ['9', '10'],
        '4': ['7', '8'],
        '7': ['11', '12']
        }

def bfs(graph, start, end):
    # maintain a queue of paths
    queue = []
    # push the first path into the queue
    queue.append([start])
    while queue:
        # get the first path from the queue
        path = queue.pop(0)
        # get the last node from the path
        node = path[-1]
        # path found
        if node == end:
            return path
        # enumerate all adjacent nodes, construct a 
        # new path and push it into the queue
        for adjacent in graph.get(node, []):
            new_path = list(path)
            new_path.append(adjacent)
            queue.append(new_path)

print bfs(graph, '1', '11')

This prints: ['1', '4', '7', '11']


Another approach would be maintaining a mapping from each node to its parent, and when inspecting the adjacent node, record its parent. When the search is done, simply backtrace according the parent mapping.

graph = {
        '1': ['2', '3', '4'],
        '2': ['5', '6'],
        '5': ['9', '10'],
        '4': ['7', '8'],
        '7': ['11', '12']
        }

def backtrace(parent, start, end):
    path = [end]
    while path[-1] != start:
        path.append(parent[path[-1]])
    path.reverse()
    return path
        

def bfs(graph, start, end):
    parent = {}
    queue = []
    queue.append(start)
    while queue:
        node = queue.pop(0)
        if node == end:
            return backtrace(parent, start, end)
        for adjacent in graph.get(node, []):
            if node not in queue :
                parent[adjacent] = node # <<<<< record its parent 
                queue.append(adjacent)

print bfs(graph, '1', '11')

The above codes are based on the assumption that there's no cycles.

Intrastellar Explorer
  • 3,005
  • 9
  • 52
  • 119
qiao
  • 17,941
  • 6
  • 57
  • 46
  • 2
    This is excellent! My thought process lead me to believe in creating some type of table or matrix, I have yet to learn about graphs. Thank you. – Christopher Markieta Jan 19 '12 at 07:04
  • 1
    I also tried using a back tracing approach although this seems much cleaner. Would it be possible to make a graph if you only know the start and the end but none of the nodes in-between? Or even another approach besides graphs? – Christopher Markieta Jan 19 '12 at 07:19
  • @ChristopherM I failed to understand your question :( – qiao Jan 19 '12 at 07:30
  • If we know the start node is "1" and the end node is "11" but the other nodes are unknown until we begin to search them; such that the graph is dynamic and will still work even if the tree had 100 nodes. For ex: { '1': [a, b, c], a: [d, e], d: [f, g], c: [h, i], h: [11, k] } – Christopher Markieta Jan 19 '12 at 07:45
  • @ChristopherM As long as the adjacent nodes are being determined when a node is reached, then the above algorithm will be fine. – qiao Jan 19 '12 at 08:01
  • Since you've used a hash table (`parent`) in the second method. It's easy to modify the method to handle cycles. Actually, BFS is often implemented using a queue with a hash table(to mark if a node is visited). – Ray Oct 08 '13 at 18:21
  • 2
    Is it possible to adapt the first algorithm so that it will return *all* paths from 1 to 11 (assuming there is more than one)? – Maria Ines Parnisari Oct 05 '14 at 04:41
  • 1
    @l19 When you find a path (`node==end`), add that path to another list that will contain all the paths you found, then `continue` instead of `return`. If you're using a visited set to prevent cycles, don't add your end node to the visited set ever (otherwise only one path can ever have that ending node). – Dominic K Jun 26 '15 at 00:21
  • @qiao is there a way to print all of the parents children up to a given depth? – Pavlos Panteliadis Dec 09 '16 at 19:38
  • 2
    It's recommended to use collections.deque instead of a list. list.pop(0)'s complexity is O(n) while deque.popleft() is O(1) – Omar_0x80 Dec 29 '18 at 16:17
  • Am I wrong or is the time complexity of the first method at least quadratic in the worst case? The problem I perceive is that `new_path = list(path)` copies the entire path over. So, assume that we're searching for a node that is the last node traversed in the graph in your implementation. Then for all L1 nodes 1 edge away we'll have made a list with 1 entry, for all L2 nodes 2 edges a way a list with 2 entries, etc. So in total, if L=max(L1, L2,...Lk), we can bound number of copies by (1 + 2 + ... + k)L which is O(L*k^2), and if k is O(n) it's O(Ln^2). Am I wrong somewhere? – EnriqueC Apr 05 '20 at 19:39
  • Just realized that O(|E| + |V|) could easily be quadratic in the number of vertices if there are enough edges. But I think my point stands becuase if L is sufficiently large you could have O(n^3)? IDK my point in the end is I'm just wondering if that line is inefficient as you're recopying a lot of elements. – EnriqueC Apr 05 '20 at 20:01
  • how would you modify your algorithms if there are cycles in the graph? – wmock May 02 '20 at 05:05
  • @DominicK - thanks for the suggestion retrieving all *possible* paths by adding paths into separate list instead of returning it.. however i run into infinite loop finding all paths between 'C' and 'D' using graph = { 'A': ['B', 'D', 'C'], 'B': ['A', 'C', 'D'], 'C': ['B', 'A'], 'D': ['B', 'A'] } – Vijay Sep 10 '20 at 22:08
40

Very easy code. You keep appending the path each time you discover a node.

graph = {
         'A': set(['B', 'C']),
         'B': set(['A', 'D', 'E']),
         'C': set(['A', 'F']),
         'D': set(['B']),
         'E': set(['B', 'F']),
         'F': set(['C', 'E'])
         }
def retunShortestPath(graph, start, end):

    queue = [(start,[start])]
    visited = set()

    while queue:
        vertex, path = queue.pop(0)
        visited.add(vertex)
        for node in graph[vertex]:
            if node == end:
                return path + [end]
            else:
                if node not in visited:
                    visited.add(node)
                    queue.append((node, path + [node]))
SeasonalShot
  • 2,357
  • 3
  • 31
  • 49
30

I liked qiao's first answer very much! The only thing missing here is to mark the vertexes as visited.

Why we need to do it?
Lets imagine that there is another node number 13 connected from node 11. Now our goal is to find node 13.
After a little bit of a run the queue will look like this:

[[1, 2, 6], [1, 3, 10], [1, 4, 7], [1, 4, 8], [1, 2, 5, 9], [1, 2, 5, 10]]

Note that there are TWO paths with node number 10 at the end.
Which means that the paths from node number 10 will be checked twice. In this case it doesn't look so bad because node number 10 doesn't have any children.. But it could be really bad (even here we will check that node twice for no reason..)
Node number 13 isn't in those paths so the program won't return before reaching to the second path with node number 10 at the end..And we will recheck it..

All we are missing is a set to mark the visited nodes and not to check them again..
This is qiao's code after the modification:

graph = {
    1: [2, 3, 4],
    2: [5, 6],
    3: [10],
    4: [7, 8],
    5: [9, 10],
    7: [11, 12],
    11: [13]
}


def bfs(graph_to_search, start, end):
    queue = [[start]]
    visited = set()

    while queue:
        # Gets the first path in the queue
        path = queue.pop(0)

        # Gets the last node in the path
        vertex = path[-1]

        # Checks if we got to the end
        if vertex == end:
            return path
        # We check if the current node is already in the visited nodes set in order not to recheck it
        elif vertex not in visited:
            # enumerate all adjacent nodes, construct a new path and push it into the queue
            for current_neighbour in graph_to_search.get(vertex, []):
                new_path = list(path)
                new_path.append(current_neighbour)
                queue.append(new_path)

            # Mark the vertex as visited
            visited.add(vertex)


print bfs(graph, 1, 13)

The output of the program will be:

[1, 4, 7, 11, 13]

Without the unneccecery rechecks..

Or Kazaz
  • 549
  • 1
  • 9
  • 16
  • 10
    It might be useful to use `collections.deque` for `queue` as list.pop(0) incur `O(n)` memory movements. Also, for the sake of posterity, if you want to do DFS just set `path = queue.pop()` in which case the variable `queue` actually acts like a `stack`. – wadkar Feb 19 '16 at 06:13
  • This will revisit nodes that neighbour visited nodes, e.g. in a situation with three nodes 1-2-3, it will visit 1, add 2 to the queue, then add 1 and 3 to the queue. The `if vertex not in visited` check should be in the for loop instead of outside it. Then the outer check can be removed because nothing will be added to the queue if the node has already been visited. – applemonkey496 Jan 13 '21 at 02:41
8

I thought I'd try code this up for fun:

graph = {
        '1': ['2', '3', '4'],
        '2': ['5', '6'],
        '5': ['9', '10'],
        '4': ['7', '8'],
        '7': ['11', '12']
        }

def bfs(graph, forefront, end):
    # assumes no cycles

    next_forefront = [(node, path + ',' + node) for i, path in forefront if i in graph for node in graph[i]]

    for node,path in next_forefront:
        if node==end:
            return path
    else:
        return bfs(graph,next_forefront,end)

print bfs(graph,[('1','1')],'11')

# >>>
# 1, 4, 7, 11

If you want cycles you could add this:

for i, j in for_front: # allow cycles, add this code
    if i in graph:
        del graph[i]
Rusty Rob
  • 16,489
  • 8
  • 100
  • 116
  • 1
    after you have built the next_for_front. A follow on question, what if the graph contains loops? E.g. if node 1 had an edge connecting back to itself? What if the graph has multiple edges going between two nodes? – Rusty Rob Dec 03 '13 at 21:33
1

I like both @Qiao first answer and @Or's addition. For a sake of a little less processing I would like to add to Or's answer.

In @Or's answer keeping track of visited node is great. We can also allow the program to exit sooner that it currently is. At some point in the for loop the current_neighbour will have to be the end, and once that happens the shortest path is found and program can return.

I would modify the the method as follow, pay close attention to the for loop

graph = {
1: [2, 3, 4],
2: [5, 6],
3: [10],
4: [7, 8],
5: [9, 10],
7: [11, 12],
11: [13]
}


    def bfs(graph_to_search, start, end):
        queue = [[start]]
        visited = set()

    while queue:
        # Gets the first path in the queue
        path = queue.pop(0)

        # Gets the last node in the path
        vertex = path[-1]

        # Checks if we got to the end
        if vertex == end:
            return path
        # We check if the current node is already in the visited nodes set in order not to recheck it
        elif vertex not in visited:
            # enumerate all adjacent nodes, construct a new path and push it into the queue
            for current_neighbour in graph_to_search.get(vertex, []):
                new_path = list(path)
                new_path.append(current_neighbour)
                queue.append(new_path)

                #No need to visit other neighbour. Return at once
                if current_neighbour == end
                    return new_path;

            # Mark the vertex as visited
            visited.add(vertex)


print bfs(graph, 1, 13)

The output and everything else will be the same. However, the code will take less time to process. This is especially useful on larger graphs. I hope this helps someone in the future.

Darie Dorlus
  • 396
  • 3
  • 8
1

With cycles included in the graph, would not something like this work better?

from collections import deque

graph = {
    1: [2, 3, 4],
    2: [5, 6, 3],
    3: [10],
    4: [7, 8],
    5: [9, 10],
    7: [11, 12],
   11: [13]
}


def bfs1(graph_to_search, start, end):
    queue = deque([start])
    visited = {start}
    trace = {}

    while queue:
        # Gets the first path in the queue
        vertex = queue.popleft()
        # Checks if we got to the end
        if vertex == end:
            break

        for neighbour in graph_to_search.get(vertex, []):
            # We check if the current neighbour is already in the visited nodes set in order not to re-add it
            if neighbour not in visited:
            # Mark the vertex as visited
                visited.add(neighbour)
                trace[neighbour] = vertex
                queue.append(neighbour)

path = [end]
while path[-1] != start:
    last_node = path[-1]
    next_node = trace[last_node]
    path.append(next_node)

return path[::-1]

print(bfs1(graph,1, 13))

This way only new nodes will be visited and moreover, avoid cycles.

Leo oeL
  • 11
  • 2
0

Javascript version and search first/all paths ...

P.S, Graph with cylces works well.

Your can convert it to python , it's easy

function search_path(graph, start, end, exhausted=true, method='bfs') {
    // note. Javascript Set is ordered set...
    const queue = [[start, new Set([start])]]
    const visited = new Set()
    const allpaths = []
    const hashPath = (path) => [...path].join(',') // any path hashing method
    while (queue.length) {
        const [node, path] = queue.shift()
        // visited node and its path instant. do not modify it others place
        visited.add(node)
        visited.add(hashPath(path))
        for (let _node of graph.get(node) || []) {
            // the paths already has the node, loops mean nothing though.
            if (path.has(_node))
                continue;
            // now new path has no repeated nodes.
            let newpath = new Set([...path, _node])
            if (_node == end){
                allpaths.push(newpath)
                if(!exhausted) return allpaths; // found and return 
            }
            else {
                if (!visited.has(_node) || // new node till now
                   // note: search all possible including the longest path
                   visited.has(_node) && !visited.has(hashPath(newpath))
                ) { 
                    if(method == 'bfs')
                       queue.push([_node, newpath])
                    else{
                       queue.unshift([_node, newpath])
                    }
                }
            }
        }
    }
    return allpaths
}

output like this..

[
    [ 'A', 'C' ],
    [ 'A', 'E', 'C'],
    [ 'A', 'E', 'F', 'C' ] // including F in `A -> C`
]
Tearf001
  • 95
  • 7