1

I'm new to programming in general but especially MPI. I'm trying to scatter multiple arrays from the root processor to the other processors, perform some operations on those arrays then gather the data but it's scattering all the data to all the processors and the output adjacency matrices aren't correct so I'm assuming it's because I'm using scatterv and/or gatherv incorrectly. I'm not sure if I should scatter the matrices element by element or if there is a way to scatter an entire matrix. If you could take a look at my code any help would be much appreciated. Thanks!

int rank, size;
MPI_Status status;
MPI_Datatype strip;
bool passflag[Nmats];


MPI::Init();
rank = MPI::COMM_WORLD.Get_rank();
size = MPI::COMM_WORLD.Get_size();
int sendcounts[size], recvcounts, displs[size], rcounts[size];

if(rank == root){

    fin.open(infname);
    fout.open(outfname);
    /* INPUT ADJ-MATS */
    for(i = 0; i < Nmats; i++){
        fin >> dummy;
        for (j = 0; j < N; j++){
            for (k = 0; k < N; k++) {
                fin >> a[i][j][k];
            }
        }
    }
}
/* Nmats = Number of matrices; N = nodes; Nmats isn't divisible by the number of processors */

Nmin= Nmats/size;
Nextra = Nmats%size;
k=0;
for(i=0; i<size; i++){
    if( i < Nextra) sendcounts[i] = Nmin + 1;
    else sendcounts[i] = Nmin;
    displs[i] = k;
    k = k + sendcounts[i];
}
recvcounts = sendcounts[rank];
MPI_Type_vector(Nmin, N, N, MPI_FLOAT, &strip);
MPI_Type_commit(&strip);

MPI_Scatterv(a, sendcounts, displs, strip, a, N*N, strip, 0, MPI_COMM_WORLD);

/* Perform operations on adj-mats */

for(i=0; i<size; i++){
    if(i<Nextra) rcounts[i] = Nmin + 1;
    else rcounts[i] = Nextra;
    displs[i] = k;
    k = k + rcounts[i];

}


MPI_Gatherv(&passflag, 1, MPI::BOOL, &passflag, rcounts , displs, MPI::BOOL, 0, MPI_COMM_WORLD);

MPI::Finalize();
//OUTPUT ADJ_MATS
for(i = 0; i < Nmats; i++) if (passflag[i]) {
    for(j=0;j<N; j++){
        for(k=0; k<N; k++){
            fout << a[i][j][k] << " ";
        }
        fout << endl;
    }
    fout << endl;
}
fout << endl;

Hi I was able to get the code working for static allocation but the code "broke" more or less when I tried to dynamically allocate it. I'm not sure if I need to allocate memory outside of MPI or if this is something I should do after I initialize MPI. Any suggestions would be much appreciated!

//int a[Nmats][N][N];

/* Prior to adding this part of the code it ran fine, now it's no longer working */ 
int *** a = new int**[Nmats];
for(i = 0; i < Nmats; ++i){
   a[i] = new int*[N];
   for(j = 0; j < N; ++j){
       a[i][j] = new int[N];
       for(k = 0; k < N; k++){
           a[i][j][k] = 0;
       }
           }
               } 

int rank, size;
MPI_Status status;
MPI_Datatype plane;
bool passflag[Nmats];


MPI::Init();
rank = MPI::COMM_WORLD.Get_rank();
size = MPI::COMM_WORLD.Get_size();
MPI_Type_contiguous(N*N, MPI_INT, &plane);
MPI_Type_commit(&plane);

int counts[size], recvcounts, displs[size+1];

if(rank == root){

fin.open(infname);   
fout.open(outfname);
    /* INPUT ADJ-MATS */
for(i = 0; i < Nmats; i++){         
  fin >> dummy;
  for (j = 0; j < N; j++){ 
          for (k = 0; k < N; k++) { 
                  fin >> a[i][j][k];                                              
                }
        }
  }

  } 


Nmin= Nmats/size;
Nextra = Nmats%size;
k=0;
for(i=0; i<size; i++){
   if( i < Nextra) counts[i] = Nmin + 1;
   else counts[i] = Nmin;
   displs[i] = k;
   k = k + counts[i];
}   
recvcounts = counts[rank];
displs[size] = Nmats;                        

MPI_Scatterv(&a[displs[rank]][0][0], counts, displs, plane, &a[displs[rank]][0][0],        recvcounts, plane, 0, MPI_COMM_WORLD);

/* Perform operations on matrices */

MPI_Gatherv(&passflag[displs[rank]], counts, MPI::BOOL, &passflag[displs[rank]], &counts[rank], displs, MPI::BOOL, 0, MPI_COMM_WORLD);

MPI_Type_free(&plane);  
MPI::Finalize();
  • 3
    *I'm new to programming in general but especially MPI* So start simple ! Get your head around scatter/gather and 1D arrays first. – High Performance Mark Jul 08 '14 at 13:44
  • 2
    Following @HighPerformanceMark 's advice - if a is allocated contiguously (this is important!) it's probably easiest to treat it as a 1d array to get started, and send `sendcounts[i]*N*N` floats to each processor. The `strip` vector you create doesn't make a lot of sense, and the receive count in the scatter is incorrect - I'm surprised your MPI implementation isn't giving you an error. There are many related questions here on SO, several listed on the side bar. – Jonathan Dursi Jul 08 '14 at 13:51
  • I don't have time to "start simple" my ass is going to get fired but thanks @JonathanDursi for the advice –  Jul 08 '14 at 14:45
  • There are other good starting points, but [my answer here](http://stackoverflow.com/questions/9269399/sending-blocks-of-2d-array-in-c-using-mpi/9271753#9271753) is a decent place to start. You need to make sure (a) the array is contiguously allocated, (b) the type describes the part of the array you want to send, (c) the type has its extent set correctly so that the scatter/gather operation makes sense, and (d) the receive count/type matches the array you're trying to receive into. – Jonathan Dursi Jul 08 '14 at 15:40
  • 1
    In addition to what Jonathan Dursi already noted, you are mixing C++ and C MPI constructs (e.g. passing `MPI::BOOL` to `MPI_Gatherv`) and this might not work as expected. The C++ interface was **deleted** from MPI-3.0 and should not be used. Using `MPI_Gatherv` with the same array for source and destination is erroneous (even overlapping arrays are erroneous) - there is a special in-place mode for that. The same applies to `MPI_Scatterv`. – Hristo Iliev Jul 08 '14 at 18:52
  • Hmm, SO promoted this question to the front page again as a result of your edit. I'm intrigued, did your ass get fired yet ? – High Performance Mark Aug 05 '14 at 16:18
  • @HighPerformanceMark Yes exactly, that's why I'm still working on this project. Thank you once again for the useful response. You're the best! xoxo –  Aug 05 '14 at 18:26

1 Answers1

0

It appears that what you have in a are actually Nmat planes of N x N elements each. The way you index a while filling its elements in the nested loops shows that those matrices are laid out contiguously in memory. Therefore you should treat a as an array of Nmat elements with each element being an N*N compound. You just have to register a contiguous type that spans the memory of a single matrix:

MPI_Type_contiguous(N*N, MPI_FLOAT, &plane);
MPI_Type_commit(&plane);

Scattering the data without using an additional array at the root is done using the in-place mode of the scatter operation:

// Perform an in-place scatter
if (rank == 0)
   MPI_Scatterv(a, sendcounts, displs, plane,
                MPI_IN_PLACE, 0, plane, 0, MPI_COMM_WORLD);
   //                         ^^^^^^^^ ignored because of MPI_IN_PLACE
else
   MPI_Scatterv(a, sendcounts, displs, plane,
   //           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ignored by non-root ranks
                a, sendcounts[rank], plane, 0, MPI_COMM_WORLD);
   //              ^^^^^^^^^^^^^^^^ !!!

Note that each rank must specify the correct number of planes it should receive by providing the corresponding element from sendcounts[] (in your code that was fixed to N*N).

The in-place mode should be used in the gather operation too:

if (rank == 0)
   MPI_Gatherv(MPI_IN_PLACE, 0, MPI_BOOL,
   //                        ^^^^^^^^^^^^ ignored because of MPI_IN_PLACE
               passflag, rcounts, displs, MPI_BOOL, 0, MPI_COMM_WORLD);
else
   MPI_Gatherv(passflag, rcounts[rank], displs, MPI_BOOL,
   //                    ^^^^^^^^^^^^^ !!!
               passflag, rcounts, displs, MPI_BOOL, 0, MPI_COMM_WORLD);
   //          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ignored by non-root ranks

Note that rcounts and sendcounts have essentially the same values and you don't have to compute them twice. Simply call the arrays counts and use it both in the MPI_Scatterv and in the MPI_Gatherv calls. The same applies to the values of displs - do not compute them twice as they are the same. You also seem to not set k to zero before the second computation (although this might just not be shown in the code posted here).

Hristo Iliev
  • 72,659
  • 12
  • 135
  • 186