6

Given this code...

import Queue

def breadthFirstSearch(graph, start, end):
    q = Queue.Queue()
    path = [start]
    q.put(path)
    while not q.empty():
        path = q.get()
        lastNode = path[len(path) - 1]
        if lastNode == end:
            return path
        for linkNode in graph[lastNode]:
            if linkNode not in path:
                newPath = []
                newPath = path + [linkNode]
                q.put(newPath)

Where graph is a dictionary representing a directed graph, eg, {'stack':['overflow'], 'foo':['bar']} ie, stack is pointing to overflow and foo is pointing to bar.

Can this breadth first search be optimised more? Because I am planning to use it on a very large dictionary.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Ogen
  • 6,499
  • 7
  • 58
  • 124
  • 2
    While it's probably not a big optimization, `Queue.Queue` is intended for synchronization between multiple threads. If you just need a simple queue data structure, use `collections.deque` instead, and avoid the synchronization overhead. – Blckknght May 25 '13 at 07:57
  • When I use it, I get a different answer, I dont know why though... – Ogen May 25 '13 at 08:31

1 Answers1

10

Why not keep a set of visited nodes so that you don't keep hitting the same nodes? This should work since it doesn't look like you're using a weighted graph. Something like this:

import Queue

def bfs(graph, start, end):
    q = Queue.Queue()
    path = [start]
    q.put(path)
    visited = set([start])

    while not q.empty():
        path = q.get()
        last_node = path[-1]
        if last_node == end:
            return path
        for node in graph[last_node]:
            if node not in visited:
                visited.add(node)
                q.put(path + [node])
Nolen Royalty
  • 18,415
  • 4
  • 40
  • 50
  • Thanks for the reply, im going to test it now and get back to you. Thanks again. – Ogen May 25 '13 at 07:01
  • I dont know why, but the code is getting stuck on the line: **while not q.empty():** – Ogen May 25 '13 at 07:05
  • @Clay I completely forgot to add `path = q.get()` to my code, that probably doesn't help... – Nolen Royalty May 25 '13 at 07:06
  • You are a genius. I tried it and it works so much faster now that I dont visit 100000+ nodes twice! Thank you very much :) – Ogen May 25 '13 at 07:12
  • 2
    As an aside, I think using `collections.deque` would make it even faster, because `Queue.Queue` is heavily synchronized, which in this case is only a burden. See [this answer](http://stackoverflow.com/a/717261/517371) by Keith Gaughan. – Tobia Feb 27 '14 at 17:13
  • @Tobia Fun to see this answer get a comment almost a year later! I agree that deque is more appropriate here, and in retrospect I probably should have recommended it. That said, I was focused on the primary optimization, which is more significant. – Nolen Royalty Feb 28 '14 at 00:33
  • But isn't only one thread running this code anyway? Would the synchronization matter? – Ogen Nov 27 '15 at 11:42