Hi all, I have an array of length N, and I'd like to divide it as best as possible between 'size' processors. N/size has a remainder, e.g. 1000 array elements divided by 7 processes, or 14 processes by 3 processes.
I'm aware of at least a couple of ways of work sharing in MPI, such as:
for (i=rank; i<N;i+=size){ a[i] = DO_SOME_WORK }
However, this does not divide the array into contiguous chunks, which I'd like to do as I believe is faster for IO reasons.
Another one I'm aware of is:
int count = N / size;
int start = rank * count;
int stop = start + count;
// now perform the loop
int nloops = 0;
for (int i=start; i<stop; ++i)
{
a[i] = DO_SOME_WORK;
}
However, with this method, for my first example we get 1000/7 = 142 = count. And so the last rank starts at 852 and ends at 994. The last 6 lines are ignored.
Would be best solution to append something like this to the previous code?
int remainder = N%size;
int start = N-remainder;
if (rank == 0){
for (i=start;i<N;i++){
a[i] = DO_SOME_WORK;
}
This seems messy, and if its the best solution I'm surprised I haven't seen it elsewhere.
Thanks for any help!