Parallelizing a Breadth-First Search

Question

I just taught myself some OpenMP and this could be stupid. Basically I'm trying to parallelize a breadth first search program in c++, with each node taking a long time to process. Here's an example code:

queue<node*> q;
q.push(head);
while (!q.empty()) {
  qSize = q.size();
  for (int i = 0; i < qSize; i++) {
    node* currNode = q.front();
    q.pop();
    doStuff(currNode);
    q.push(currNode);
  }
}

The processing function doStuff() is quite expensive and I want to parallelize it. However if I parallelize the for loop by putting #pragma omp parallel for right before the for line, all kinds of weird error pop up at runtime. I'm guessing the reason is that this way q.front() and q.push() will also get parallelized, and multiple threads will possibly get the same node through q.front() (because they all got processed before any q.push has been processed).

How can I get around this?

Breadth-first search is inherently a sequential algorithm, since each step depends on the results of the previous steps. There are some techniques for parallelizing it — do a web search for "parallel tree traversal" — but it's not going to be as simple as an OpenMP `parallel for`. — Wyzard, May 20 '17 at 07:01
@Wyzard you must be confusing BFS with DFS. Computing the front of a BFS in parallel is just fine. — Zulan, May 20 '17 at 08:04
@Zulan I still didn 't get the idea of making the critical section and performing `q.front() and q.pop()`. Let' s take a binary search tree and at a given moment we have queue size = 4, which indicate that there are 4 nodes at a given depth, then why do we need to pop these in critical section because if we create threads and run them in parallel, all will call q.front() and then pop the first element parallely and later push the relevant next nodes which will be in the same depth and this can again be done parallely.So, what's the reason for the critical section. Kindly explain. — asn, Nov 04 '19 at 17:48
You need the critical sections because `std::queue` does not provide thread-safe accessors on its own. This would mean if you were calling these functions in parallel, you have a race condition. Please see https://stackoverflow.com/questions/34510/what-is-a-race-condition — Zulan, Nov 04 '19 at 20:29

score 6 · Accepted Answer · answered May 20 '17 at 08:32

The solution is to protect access to the queue with a critical section.

queue<node*> q;
q.push(head);
while (!q.empty()) {
  qSize = q.size();
  #pragma omp parallel for
  for (int i = 0; i < qSize; i++) {
    node* currNode;
    #pragma omp critical
    {
      currNode = q.front();
      q.pop();
    }
    doStuff(currNode);
    #pragma omp critical
    q.push(currNode);
  }
}

This is similar to having a common mutex and locking it.

There are some limits in efficiency with this version: At the end of the for loop, some threads may idle, despite work being in the queue. Making a version where threads continuously work whenever there is something in the queue is a bit tricky in terms of handling the situations where the queue is empty but some threads are still computing.

Depending of the data size involved in a node, you may also have significant performance impact of cache-effects and false sharing. But that can't be discussed with a specific example. The simple version will probably be sufficiently efficient in many cases, but getting optimal performance may become arbitrarily complex.

In any case, you have to ensure that doStuff does not do modify any global or shared state.

Is it necessary to put `q.push(currNode)` in a critical section? This for loop only runs as many times as the size of the queue decided at the beginning of each level, and q.push() only push stuff to the end of the queue. What do you think? — Xianghai Sheng, May 22 '17 at 21:28
**Yes**. It is absolutely necessary to protect all access to `q` in this case as `std::queue` is not a concurrent data structure. Take a look at [this talk](https://www.youtube.com/watch?v=c1gO9aB9nbs) to get a glimpse of the things that you need to think about when considering concurrent data structures. — Zulan, May 23 '17 at 08:08
Why do we need both `q.front()` and `q.pop()`? Why can't we set `currNode` directly with `currNode = q.pop();`? — pretzlstyle, Oct 24 '22 at 17:33
@pretzlstyle unfortunately `std::queue<>::pop` [does not return the value](https://stackoverflow.com/a/25035949/620382). — Zulan, Nov 21 '22 at 11:37

Parallelizing a Breadth-First Search

1 Answers1

Linked

Related