3

I want to move data to right in a circular way once using MPI. That is, if you have 4 nodes, 1->2, 2->3, 3->4, 4->1. I am using boost mpi and have the following code to do it.

mat new_W(this->W.n_rows,this->W.n_cols);
int p_send = MPI_RANK + 1 >= MPI_SIZE ? 0 : MPI_RANK + 1;
int p_recv = MPI_RANK - 1 < 0 ? MPI_SIZE - 1 : MPI_RANK - 1;
vector<boost::mpi::request> reqs;
reqs.push_back(this->world.isend(p_send, MAT_TAG, this->W));
reqs.push_back(this->world.irecv(p_recv, MAT_TAG, new_W));    
boost::mpi::wait_all(ALL(reqs));

On the above code I have the following observations.

  1. While sending larger data sizes, MPI_ALL_GATHER over the all the nodes is faster than this right rotate. That is every one exchanging their data with everyone is faster than just sending to its neighbor. My MPI_ALL_GATHER routine is as follows.

    vector<mat> all_x;  
    boost::mpi::all_gather (this->world,X,all_x);
    
  2. How to make the above right rotate faster for larger data packets.

Wesley Bland
  • 8,816
  • 3
  • 44
  • 59
Ramakrishnan Kannan
  • 604
  • 1
  • 11
  • 24
  • Can you share how you are calling `MPI_Allgather`? Specifically, are you using `MPI_Allgather()` or `boost::mpi::allgather()` and how are you passing your matrix datastructure. Your issue could very well be related to serialization overhead. – Patrick Jun 02 '15 at 17:41
  • @Patrick : I have edited the above question to include the boost::mpi::all gather call. – Ramakrishnan Kannan Jun 02 '15 at 19:31
  • What is large? I implemented this in Fortran with standard OpenMPI and at around 2000 elements per process, the unblocking `isend`/`irecv` gets faster than `all_gather`. At 50 million elements per process, `all_gather` takes around 4-5 times longer. – Stefan Jun 24 '15 at 12:36
  • I've never used boost::mpi and I therefore don't know how it behave internally. However, a good practice when it comes to doing P2P MPI communications is to post receiving requests prior to posting the corresponding sending ones (this to avoid as much as possible unexpected messages, which take longer to process for the MPI library). Would you mind trying again with the `isend()` and `irecv()` lines swapped? – Gilles Nov 25 '15 at 15:26

0 Answers0