I'm implementing a distributed image(greyscale) convolution using MPI. My existing pattern is to read the image as a 1D flattened array at the root process and then scatter them to all the processes (row-decomposition) and then do a MPI_Gather
at the root process and then write the image out again as a 1D flattened array. Obviously, this doesn't give the expected results since with image convolution, the situation gets tricky at the boundaries.
So, to improve upon the aforementioned pattern, I want to implement the so called ghost cell exchange
pattern wherein the processes exchange their rows in the ghost rows.
In pseudocode:
if (rank == 0) {
src = null
dest = rank + 1
}
if (rank == size - 1) {
src = rank - 1
dest = null
} else {
src = rank - 1
dest = rank + 1
}
MPI_SendRecv(&sendbuf[offset], slen, dest..
&recvbuf[offset], rlen, src);
How do I allocate memory for the "ghost rows" on each process? Should I pre-allocate the memory and then scatter? I don't want to go for a "custom-datatype" solution since it's an overkill for the scope of the problem I'm considering.