Python Breadth First Search optimisation

Question

Given this code...

import Queue

def breadthFirstSearch(graph, start, end):
    q = Queue.Queue()
    path = [start]
    q.put(path)
    while not q.empty():
        path = q.get()
        lastNode = path[len(path) - 1]
        if lastNode == end:
            return path
        for linkNode in graph[lastNode]:
            if linkNode not in path:
                newPath = []
                newPath = path + [linkNode]
                q.put(newPath)

Where graph is a dictionary representing a directed graph, eg, {'stack':['overflow'], 'foo':['bar']} ie, stack is pointing to overflow and foo is pointing to bar.

Can this breadth first search be optimised more? Because I am planning to use it on a very large dictionary.

While it's probably not a big optimization, `Queue.Queue` is intended for synchronization between multiple threads. If you just need a simple queue data structure, use `collections.deque` instead, and avoid the synchronization overhead. — Blckknght, May 25 '13 at 07:57
When I use it, I get a different answer, I dont know why though... — Ogen, May 25 '13 at 08:31

Nolen Royalty · Accepted Answer · 2013-05-25T07:06:36.547

10

Why not keep a set of visited nodes so that you don't keep hitting the same nodes? This should work since it doesn't look like you're using a weighted graph. Something like this:

import Queue

def bfs(graph, start, end):
    q = Queue.Queue()
    path = [start]
    q.put(path)
    visited = set([start])

    while not q.empty():
        path = q.get()
        last_node = path[-1]
        if last_node == end:
            return path
        for node in graph[last_node]:
            if node not in visited:
                visited.add(node)
                q.put(path + [node])

edited May 25 '13 at 07:06

answered May 25 '13 at 07:00

Nolen Royalty

18,415
4
40
50

Thanks for the reply, im going to test it now and get back to you. Thanks again. – Ogen May 25 '13 at 07:01
I dont know why, but the code is getting stuck on the line: **while not q.empty():** – Ogen May 25 '13 at 07:05
@Clay I completely forgot to add `path = q.get()` to my code, that probably doesn't help... – Nolen Royalty May 25 '13 at 07:06
You are a genius. I tried it and it works so much faster now that I dont visit 100000+ nodes twice! Thank you very much :) – Ogen May 25 '13 at 07:12
2

As an aside, I think using `collections.deque` would make it even faster, because `Queue.Queue` is heavily synchronized, which in this case is only a burden. See [this answer](http://stackoverflow.com/a/717261/517371) by Keith Gaughan. – Tobia Feb 27 '14 at 17:13
@Tobia Fun to see this answer get a comment almost a year later! I agree that deque is more appropriate here, and in retrospect I probably should have recommended it. That said, I was focused on the primary optimization, which is more significant. – Nolen Royalty Feb 28 '14 at 00:33
But isn't only one thread running this code anyway? Would the synchronization matter? – Ogen Nov 27 '15 at 11:42

Python Breadth First Search optimisation

1 Answers1