Let me expand on my comment. Unless you target cross-platform compatibility and you'd like your code to also compile and work with MS Visual C++, you can offset the complexity of providing "linear" iterators to graph objects by using explicit OpenMP tasks. Explicit tasking was introduced in OpenMP 3.0 (hence it is not supported by MSVC which only conforms to a much earlier specification, even in 2012). Tasks are blocks of code that can execute in parallel. They are created by the task
construct:
... other code ...
#pragma omp task
{
... task code ...
}
... other code ...
Whenever the execution flow reaches the marked region, a new task object is created and put into a task queue. Then at certain points in time, idle threads grab one task from the queue and start executing it. Tasks are much like OpenMP sections in that they inherit their environment and that they can run in different order than in the serial version of the code.
With tasks one can implement recursive algorithms and also can easily work with C++ containers that do not provide random iterators. For example traversal of a binary tree can be performed like this with tasks:
// Helper routine to traverse a tree and create one task for each node
void traverse_and_make_tasks(tree_node *tn)
{
if (tn == NULL)
return;
// Create a task to process the node
#pragma omp task
{
very_long_computation(tn->value);
}
traverse_and_make_tasks(tn->left);
traverse_and_make_tasks(tn->right);
}
... main code ...
// Disable dynamic teams
omp_set_dynamic(0);
#pragma omp parallel
{
#pragma omp single nowait
{
traverse_and_make_tasks(tree->root_node);
}
}
(disabling dynamic teams is needed in order to prevent the OpenMP run-time from being too smart and executing the parallel
region single-threadedly)
This is a very common tasking pattern known as the single/serial producer. Whenever the execution enters the parallel
region, a single thread executes the code in the single
construct. It calls traverse_and_make_tasks
with the root node of the three. traverse_and_make_tasks
walks the three and creates one task for each node. The task
construct only works when used inside a parallel
region (static scoping) or when used in code called (directly or indirectly) from inside a parallel
region (dynamic scoping). As traverse_and_make_tasks
walks the tree, it produces tasks that get queued. At the end of the parallel
region there is an implicit task scheduling point, which roughly means that the execution would not resume past the end of the region until all tasks have been executed. One can also put explicit points inside the parallel region using #pragma omp taskwait
. It works similarly to barrier
- execution "blocks" until all tasks have been processed.
Another common pattern is the parallel producer where tasks are generated in parallel. The example code above can be easily transformed into a parallel producer by a simple modification on traverse_and_make_tasks
:
void traverse_and_make_tasks(tree_node *tn)
{
if (tn == NULL)
return;
#pragma omp task
traverse_and_make_tasks(tn->left);
#pragma omp task
traverse_and_make_tasks(tn->right);
// Create a task to process the node
very_long_computation(tn->value);
}
This version of the code creates two task at each node - one to process the left descendant and one to process the right descendant. If this was serial code it would traverse the tree from bottom up. However in the parallel case task queueing would result in more or less top to bottom traversal.
There are many other possible scenarios of using tasks. One can also use them in a non-recursive case to process containers that do not provide random iterators, required by the worksharing construct for
:
typedef container_type::iterator citer;
container_type container;
... push some values in the container ...
#pragma omp parallel
{
#pragma omp single nowait
{
for (citer it = container.begin(); it != container.end(); it++)
#pragma omp task
process(*it);
}
#pragma omp taskwait
// process more
#pragma omp single nowait
{
for (citer it = container.begin(); it != container.end(); it++)
#pragma omp task
process_more(*it);
}
}
This example also illustrates the use of explicit task synchronisation inside the parallel
region.