I have a parallel loop that can be parallelised easily. I am required to implement a custom scheduling option to the loop. To do this I will explicitly assign iterations to threads rather than use the standard parallel for region as in:
#pragma omp parallel for
Instead I just use:
#pragama omp parallel [declarations]
My code so far is as follows:
#define N 500
#pragma omp parallel
{
int num_threads = omp_get_num_threads();
int thread = omp_get_thread_num();
int start = N*thread/num_threads;
int end = N*(thread+1)/num_threads;
for (i=start; i<end; i++){
/* LOOP */
}
}
I need to apply a scheduling algorithm such that each thread is assigned a local set of iterations, which I believe I have already done. Each local set is split up into chunks which each thread executes. When the thread completes it's chunk then it finds the thread which has the most remaining chunks and begins executing those chunks. This process is repeated until no chunks remain. I'm struggling to get my head around how to begin this as I'm not entirely sure how to find out what the most loaded thread is and also how to I go about splitting the local set of iterations into chunks, whilst still remaining in the parallel region.