OpenMP custom scheduling

Question

I have a parallel loop that can be parallelised easily. I am required to implement a custom scheduling option to the loop. To do this I will explicitly assign iterations to threads rather than use the standard parallel for region as in:

#pragma omp parallel for

Instead I just use:

#pragama omp parallel [declarations]

My code so far is as follows:

#define N 500
#pragma omp parallel 
{
int num_threads = omp_get_num_threads();
int thread = omp_get_thread_num();
int start = N*thread/num_threads;
int end = N*(thread+1)/num_threads;
   for (i=start; i<end; i++){
   /* LOOP */
   }
}

I need to apply a scheduling algorithm such that each thread is assigned a local set of iterations, which I believe I have already done. Each local set is split up into chunks which each thread executes. When the thread completes it's chunk then it finds the thread which has the most remaining chunks and begins executing those chunks. This process is repeated until no chunks remain. I'm struggling to get my head around how to begin this as I'm not entirely sure how to find out what the most loaded thread is and also how to I go about splitting the local set of iterations into chunks, whilst still remaining in the parallel region.

Im not entirely sure I got the question, do you have to implement yourself a load balance scheduling, or do you have to use OpenMP to do so? — Nadir, Apr 08 '18 at 22:39
It would probably be easier and more efficient to use a queue of work items (chunks) and some, possibly variable, number of worker threads. If `i` is an index into array, then all your queue has to have are `startIdx` and `count`. — jwdonahue, Apr 09 '18 at 00:16
I have to use OpenMP to implement an affinity scheduling option. @jwdonahue i is an index into an array that is correct. I'd imagine that method would work well. I have an array of start indexes now. But I'm stuck on how to allow threads to jump to other start points if complete. Or how to monitor progress. — Sigma-Cosine, Apr 09 '18 at 23:02
I went into some detail about implementing the OpenMP schedulers by hand [here](https://stackoverflow.com/a/30591616/2542702). You can probably use that to define a custom scheduler not implemented by OpenMP. — Z boson, Apr 10 '18 at 12:02
SImilar to the dynamic schedule except that we have N iterations for example. Then we divide this into equal chunks for the threads. Then we split these chunks further. What I want is when a thread finishes it's assigned chunks then it goes and works on other chunks. I think it's similar to dynamic except for the fact that the chunks are assigned at the start and the threads move to different chunks when necessary. — Sigma-Cosine, Apr 10 '18 at 14:02

score 1 · Answer 1 · answered Apr 09 '18 at 05:29

What you need to implement is a work stealing algorithm. It sounds like what you want is to assign each thread a double-ended queue (sometimes called a dequeue). If you're allowing any thread to steal work from any other thread, you will want a lock associated with each queue to prevent two threads from attempt to steal the same piece of work.

You will also want to put each of these queues into a priority queue using their size as the key, in order to allow finished threads to get the thread with the most work remaining to steal from. You will also want a lock for that global priority queue, so as to maintain its integrity as updates are performed.

All that being said, when usuable, a dynamic schedule for #pragma omp for or spawning tasks (via #pragma omp task) can be preferable. If you do implement your own work stealing algorithm, you keep in mind choosing which thread to steal from based on a metric which will be the same for all threads will often lead to collision (i.e., all idle threads will attempt to steal from the same queue). Consider choosing where to steal work based on locality instead.

Alternatively, you can use either a taskloop, or schedule(nonmonotonic:dynamic) in which case the OpenMP runtime can do iteration stealing... — Jim Cownie, Apr 09 '18 at 08:21
I believe I need to implement a work stealing algorithm. I've made two arrays which split the data up per thread and stores the start and end points. However I'm confused now as to how to split this up further. The way I'm thinking of implementing it is by splitting the array into further chunks. Then getting the threads to execute it's chunks then find the remaining chunks. Though I'm unsure how to monitor the progress so the threads know which threads are remaining? — Sigma-Cosine, Apr 09 '18 at 23:01

OpenMP custom scheduling

1 Answers1