Parallel breadth first search

Question

I am comparing difference in time execution of two breadth first search algorithms. Parallel and non parallel. But for every graph I make, the non parallel version is 10X faster than the parallel one.

This is parallel Breadth first search. I don't know where the problem is but I suppose somewhere in this method:

public static int PBFS(Node start, Node end)
{
    var queue = new ConcurrentQueue<Node>();
    queue.Enqueue(start);

    while (queue.Count != 0)
    {
        bool isFinished = false;
        if (isFinished) break;

        Parallel.ForEach<Node>(queue, CurrentNode =>
        {
            if (queue.TryDequeue(out CurrentNode))
            {
                foreach (Node Child in CurrentNode.Children.Keys)
                {
                    if (Child.IsVisited == false)
                    {
                        Child.IsVisited = true; 
                        if (Child == end) { isFinished = true; return; }
                    }
                    queue.Enqueue(Child);
                }
            }  
            return;
        });
    } return 1;
}

This is non parallel BFS:

public static int BFS(Node start, Node end)
{
    Queue<Node> queue = new Queue<Node>();
    queue.Enqueue(start);

    while (queue.Count != 0)
    {
        Node CurrentNode = queue.Dequeue();
        CurrentNode.IsVisited = true;

        foreach (Node Child in CurrentNode.Children.Keys)
        {
            if (Child.IsVisited == false)
            {
                Child.IsVisited = true;
                //Console.Write(Child.Name+" ");
                if (Child == end) return 1;
            }
            queue.Enqueue(Child);
        }
        // Console.WriteLine();
    }

    // Console.WriteLine();
    return 0;
}

I know this is not what you're asking, but your parallel code always returns 1. I recommend moving the declaration of `isFinished` outside of the while loop and calling it soemthing like `found`. Then, at the end of your function, return 1 if `found` or 0 otherwise. — Jon Senchyna, Sep 06 '12 at 17:51
that return is just because the function is int. doesn't mean anything — bljuvko, Sep 06 '12 at 17:55
You can remove a small amount of contention by creating a new temp `ConcurrentQueue` at the beginning of each iteration of your `while` loop, enqueueing children to that queue, and then making `queue` reference your (now full) temp queue at the end of each iteration of the `while` loop. — Jon Senchyna, Sep 06 '12 at 18:51
In addition, your parallel code is only checking `isFinished` when it starts an iteratrion of the `while` loop. If you have a large queue, that means you'll spend a lot of time doing useless cycles in your `Parallel.Foreach` once you find your result. If your code is queueing children faster than `Parallel.Foreach` pulls them out of the queue, you may end up going through your whole tree before your code finds out that it found the result way back around when it started. — Jon Senchyna, Sep 06 '12 at 18:54
You're traversing the children of each node in parallel but putting the child nodes in a common queue. You're still processing each node sequentially but now with extra overhead. And since you're just traversing the tree and not doing any processor-intensive work (such as string comparison) you won't see any benefit from parallelization. However, if you switch to a true parallel traversing and inject some real work then you'll see a significant improvement with multiple cores. Broadhurst has an example: (http://stackoverflow.com/questions/7099703/parallel-tree-traversal-in-c-sharp). — Ed Power, Sep 07 '12 at 18:57

score 4 · Accepted Answer · answered Sep 06 '12 at 17:41

Parallelisation and concurrency with shared data requires synchronization. Synchronization is expensive--as you're likely witnessing. ConcurrentQueue has it's own synchronization which likely isn't optimal for your situation.

Parallelization beyond the number of CPUs (which likely isn't occurring here, but it's relevant) incurs a lot of Context Switches--which are expensive and reduce the productivity of the code that's running in parallel. i.e. the premise of throwing more threads at a problem often produces the opposite effect.

If performance is a concern, I think you might want to look at a different design, maybe Actor-based, message-passing, or producer/consumer to avoid having shared data and avoid synchronization of that shared data.

score 1 · Answer 2 · answered Mar 27 '13 at 12:34

I suggest you first write a parallel bidirectional BFS: you create two search threads, one going from the start node along the "arrows", and the other one going in reversed direction starting from the goal node, and terminate the both threads when their search frontiers "meet". For instance, in Java that would look like [this].

Parallel breadth first search

2 Answers2