Parallel Iterators

Question

I am designing a C++ data structure (for graphs) which is to be used by parallel code (using OpenMP).

Suppose I want to have a method which enables iteration over all elements (nodes). Of course, this iteration is going to be parallelized.

Is it possible to use an iterator for this purpose? How should an iterator look like that enables parallel access? Would you advise for or against using iterators in this case?

Parallelization on shared data is tedious. You should think carefully on what you want to achieve and how to do it. What exactly is the data you are working on? A message queue, being written by one, and read by anpther (which would be a reader/writer problem)? Or a matrix you want to invert in parallel? So in other words: what is your data structure, and what is your operation? — Zane, Nov 05 '12 at 13:52
My data structure is a dynamic graph. It should support iteration over all nodes and edges, as well as iteration over the neighbors of a node and breadth-first search. Modifications such as graph coarsening/contraction need to be supported. — clstaudt, Nov 05 '12 at 14:13
Implementing graph is a difficult and error prone process, why not looking at some libraries? Boost::graph can be helpful and has already parallelized data structures and algorithms — linello, Nov 05 '12 at 14:41
Colleagues and superiors described Boost:graph to me as "next to terrible" and recommended a custom implementation. I should have a look at libraries, though. Anything else besides Boost out there? — clstaudt, Nov 05 '12 at 14:58
If you have trouble implementing a random access iterator, you can always use the OpenMP 3.0 (and later) `task` construct to perform task-based parallelism instead of worksharing-based one (i.e. the OpenMP `for` construct) — Hristo Iliev, Nov 05 '12 at 19:33

Fred Foo · Accepted Answer · 2012-11-05T15:41:36.653

6

OpenMP parallel loops don't play nicely with iterators. You'll want to implement an indexing mechanism (operator[] taking an integral argument) on your graph class.

If you do want to use OpenMP 3.0 iterator support, make sure you have a random access iterator. Implementing it as an pointer to a node or edge is the simplest choice.

edited Nov 05 '12 at 15:41

answered Nov 05 '12 at 13:42

Fred Foo

355,277
75
744
836

1

Random access iterators will probably be useful here. Ie operator+(into) should be defined. That makes operator[] a no brainer. – Yakk - Adam Nevraumont Nov 05 '12 at 14:01
Why is a random access iterator better in combination with OpenMP than a plain iterator? – clstaudt Nov 07 '12 at 15:37
1

@cls: loops involving non-RA iterators can't be run in parallel. They're just not supported. – Fred Foo Nov 07 '12 at 15:40

Hristo Iliev · Answer 2 · 2012-11-06T09:29:31.130

Let me expand on my comment. Unless you target cross-platform compatibility and you'd like your code to also compile and work with MS Visual C++, you can offset the complexity of providing "linear" iterators to graph objects by using explicit OpenMP tasks. Explicit tasking was introduced in OpenMP 3.0 (hence it is not supported by MSVC which only conforms to a much earlier specification, even in 2012). Tasks are blocks of code that can execute in parallel. They are created by the task construct:

... other code ...
#pragma omp task
{
   ... task code ...
}
... other code ...

Whenever the execution flow reaches the marked region, a new task object is created and put into a task queue. Then at certain points in time, idle threads grab one task from the queue and start executing it. Tasks are much like OpenMP sections in that they inherit their environment and that they can run in different order than in the serial version of the code.

With tasks one can implement recursive algorithms and also can easily work with C++ containers that do not provide random iterators. For example traversal of a binary tree can be performed like this with tasks:

// Helper routine to traverse a tree and create one task for each node
void traverse_and_make_tasks(tree_node *tn)
{
   if (tn == NULL)
      return;

   // Create a task to process the node
   #pragma omp task
   {
      very_long_computation(tn->value);
   }

   traverse_and_make_tasks(tn->left);
   traverse_and_make_tasks(tn->right);
}

... main code ...

// Disable dynamic teams
omp_set_dynamic(0);

#pragma omp parallel
{
   #pragma omp single nowait
   {
      traverse_and_make_tasks(tree->root_node);
   }
}

(disabling dynamic teams is needed in order to prevent the OpenMP run-time from being too smart and executing the parallel region single-threadedly)

This is a very common tasking pattern known as the single/serial producer. Whenever the execution enters the parallel region, a single thread executes the code in the single construct. It calls traverse_and_make_tasks with the root node of the three. traverse_and_make_tasks walks the three and creates one task for each node. The task construct only works when used inside a parallel region (static scoping) or when used in code called (directly or indirectly) from inside a parallel region (dynamic scoping). As traverse_and_make_tasks walks the tree, it produces tasks that get queued. At the end of the parallel region there is an implicit task scheduling point, which roughly means that the execution would not resume past the end of the region until all tasks have been executed. One can also put explicit points inside the parallel region using #pragma omp taskwait. It works similarly to barrier - execution "blocks" until all tasks have been processed.

Another common pattern is the parallel producer where tasks are generated in parallel. The example code above can be easily transformed into a parallel producer by a simple modification on traverse_and_make_tasks:

void traverse_and_make_tasks(tree_node *tn)
{
   if (tn == NULL)
      return;

   #pragma omp task
   traverse_and_make_tasks(tn->left);
   #pragma omp task
   traverse_and_make_tasks(tn->right);

   // Create a task to process the node
   very_long_computation(tn->value);
}

This version of the code creates two task at each node - one to process the left descendant and one to process the right descendant. If this was serial code it would traverse the tree from bottom up. However in the parallel case task queueing would result in more or less top to bottom traversal.

There are many other possible scenarios of using tasks. One can also use them in a non-recursive case to process containers that do not provide random iterators, required by the worksharing construct for:

typedef container_type::iterator citer;

container_type container;
... push some values in the container ...

#pragma omp parallel
{
   #pragma omp single nowait
   {
      for (citer it = container.begin(); it != container.end(); it++)
         #pragma omp task
         process(*it);
   }

   #pragma omp taskwait

   // process more
   #pragma omp single nowait
   {
      for (citer it = container.begin(); it != container.end(); it++)
         #pragma omp task
         process_more(*it);
   }
}

This example also illustrates the use of explicit task synchronisation inside the parallel region.

score 1 · Answer 3 · answered Nov 05 '12 at 13:46

This is a variation of the reader/writer problem.

It depends on whether that structure will be mutable or not. If not, then off you go, read in parallel as much as you want.

But if it will be mutable, then iterators will likely clash with changes made to the structure. For example, the iterator might arrive on an element that is currently being deleted. One solution is to make a read-only, immutable copy of the structure for each iterator, but then that iterator will not register any changes made to the structure after the iterator was created. Second solution is to make a copy-on-write implementation, which will cause all writes to the structure create a new object, with currently running iterators operating on the old one.

You need to decide what writes to that structure mean to your program, algorithm-wise, and then implement the read/write locking appropriately.

score 1 · Answer 4 · answered Nov 15 '12 at 21:17

1

If they are trees, you will probably want to think more in terms of scans on Euler Tour Traversals than "iterators". http://en.wikipedia.org/wiki/Euler_tour_technique

I wish I had Stepanov's book in front of me. I remember him briefly touching on it.

answered Nov 15 '12 at 21:17

Chad Brewbaker

2,523
2
19
26

score 0 · Answer 5 · edited May 23 '17 at 12:22

0

I've had exactly the same problem in Java. A solution which I have implemented uses a "hashmap of hashmaps". I still don't understand WHY the standard libraries do not let us do multi-threaded iterators... You can read my question, along with my answer (with link to Java code) here:

Scalable way to access every element of ConcurrentHashMap<Element, Boolean> exactly once

edited May 23 '17 at 12:22

Community

1
1

answered Nov 23 '13 at 15:38

Adam Kurkiewicz

1,526
1
15
34

Parallel Iterators

5 Answers5

Linked