Your premise is misleading:
the code [...] has almost all the traversal code in the critical section.
std::queue<node*> q;
q.push(head);
while (!q.empty()) {
qSize = q.size();
#pragma omp parallel for
for (int i = 0; i < qSize; i++) {
node* currNode;
#pragma omp critical
{
currNode = q.front();
q.pop();
}
doStuff(currNode);
#pragma omp critical
q.push(currNode);
}
}
Sure, the traversal itself is in fact in a critical section, as it must be if you are using a non-thread-safe datastructure. However, the premise of this question was:
The processing function doStuff()
is quite expensive
As long as this holds true, it is not an issue that the remaining code is in a critical section. For instance you could use Amdahl's law to compute the theoretically achievable speedup.
All that said, if your doStuff
is very cheap, your observation is of course true. I would then recommend using a search that does not require a shared queue such as depth-first search or iterative deepening depth-first search.