MPI Reduce to specific receive buffer address

Question

I'm trying to combine two arrays (each of length n) into the receiving buffer on root process (rank=0) to form a array of length 2*n, i.e. a single array containing all the values.

For brevity, my code resembles the following:

#define ROOT 0

int myFunction(int* rBuf, int n) {
  int* sBuf = malloc(n*sizeof(int));

  // Do work, calculate offset, count etc.

  MPI_Reduce(sBuf, rBuf+offset[rank], counts[rank], 
             MPI_INT, MPI_SUM, ROOT, MPI_COMM_WORLD);
}
// where offset[rank] is amount to offset where it is to be received
// offset[0] = 0, offset[1] = n
// counts contains the length of arrays on each process

However when I check rBuf, it is reduced to rBuf without the offset, for example:

// Rank 0: sBuf = {3, 2}
// Rank 1: sBuf = {5, 1}
// Should be rBuf = {3, 2, 5, 1}    
rBuf = {8, 3, 0, 0}

Additional info:

rBuf is initialized to correct size with 0s in values prior to reduce
All processes have the offset array
Reason for using MPI_Reduce at the time was if the rBuf is set to 0s then reduce with MPI_SUM would give the needed answer

I've looked up documentation, some tutorials/guides online and of course SO and I still can't figure out what I'm doing wrong.

For an answer, I'm specifically looking for:

Is this technically possible using MPI_Reduce?
Is my MPI_Reduce call correct? (error in pointer arithmetic?)
Is feasible/right practice using MPI or is a better approach?

Thanks

I'm thinking I should try using MPI_Gather, might simplify/fix the problem — xlm, Apr 12 '13 at 07:33
refer to this https://mpitutorial.com/tutorials/mpi-reduce-and-allreduce/ — Y00, Sep 24 '20 at 04:45

score 2 · Accepted Answer · edited May 23 '17 at 12:12

Gather (and scatter) is described in some detail in this answer.

Reduce and Gather are related but different operations. When you called MPI_Reduce on these vectors

// Rank 0: sBuf = {3, 2}
// Rank 1: sBuf = {5, 1}

Reduce did exactly the right thing; it took the various sBufs and added them (because you told it to execute the operation MPI_SUM on the data), giving {8,3} == {3,2} + {5,1}, and putting the result in the root processors receive buffer. (If you want everyone to have the answer afterwards, use MPI_Allreduce() instead.) But note that your call to Reduce,

 MPI_Reduce(sBuf, rBuf+offset[rank], counts[rank], 
             MPI_INT, MPI_SUM, ROOT, MPI_COMM_WORLD);

isn't actually valid; for Reduce, everyone needs to make the call with the same count. And the only rBuf that matters is the one on the root process, which in this case is rank 0.

Gather, on the other hand, also collects all of the data, but instead of collapsing it with a sum, product, xor, etc operation, it concatenates the results.

I see, so since rBuf only matters on the root, then offset has no real effect? Since offset[ROOT]=0. — xlm, Apr 12 '13 at 12:54

xlm · Answer 2 · 2013-04-12T09:19:03.240

So I tried MPI_Gatherv and that seems to have fixed the problem, verified for much larger number and size of arrays.

Here's what I did:

MPI_Gatherv(sBuf, counts[rank], MPI_INT, c, counts, offset, MPI_INT, 
            ROOT, MPI_COMM_WORLD);

I also tried MPI_Gather but that didn't work (it appeared to but really passing offset in a similar fashion to my reduce call had no actual effect).

From this, my understanding in relation to my specific questions are as follows:

This isn't possible/not the intended use case for MPI_Reduce
The reduce call is thus incorrect, including offset in the call has no effect
The correct approach is to use MPI_Gatherv as this is what this library call specifically addresses (displacement in the receiving buffer)

Would be great if a more experienced MPI user(s) would like to weigh in.

Thanks

MPI Reduce to specific receive buffer address

2 Answers2