I am trying to parallelize a for
loop operation which is sandwiched between two for
loops.
After the data (3d array) is calculated in each processor I want to collect the data from each processor back to the root node for my further processing. I tried using the MPI_Gather
function to get the data back to the root node. Using this function the data is collected back from root processor and but the data is not collected from the other processors.
int main(int argc, char * argv[]) {
int i,k,l,j;
int Np = 7, Nz = 7, Nr = 4;
int mynode, totalnodes;
MPI_Status status;
long double ***k_p, ***k_p1;
int startvalp,endvalp;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &totalnodes);
MPI_Comm_rank(MPI_COMM_WORLD, &mynode);
// Allocation of memory
Allocate_3D_R(k_p,(Nz+1),(Np+1),(Nr+1));
Allocate_3D_R(k_p1,(Nz+1),(Np+1),(Nr+1));
// startvalp represents the local starting value for each processor
// endvalp represents the local ending value for each processor
startvalp = (Np+1)*mynode/totalnodes - 0;
endvalp = startvalp + (((Np+1)/totalnodes) -1);
for(l = 0 ; l <= 1 ; l++){
for(k=startvalp; k<=endvalp; k++){
// for loop parallelized between the processors
// original loop: for(k=0; k<= Np; k++)
for(i=0; i<=1; i++){
k_p[i][k][l] = l+k+i;
}
}
}
// For Np = 7 and for two processors ;
// k = 0 - 3 is calculated in processor 0;
// k = 4 - 7 is calculated in processor 1;
// Now I need to collect the value of k_p from processor 1
// back to the root processor.
// MPI_Gather function is used.
for(l = 0 ; l <= 1 ; l++){
for(k=startvalp; k<=endvalp; k++){
for(i=0; i<=1; i++){
MPI_Gather(&(k_p[i][k][l]),1, MPI_LONG_DOUBLE,&(k_p1[i][k][l]),1, MPI_LONG_DOUBLE, 0, MPI_COMM_WORLD);
}
}
}
// Using this the k_p is collected from root processor and stored
// in the k_p1 variable, but from the slave processor it is not
// collected back to the root processor.
if(mynode == 0){
for(l = 0 ; l <= 1 ; l++){
for(k=0; k<=Np; k++){
for(i=0i<=1;i++){
cout << "Processor "<<mynode;
cout << ": k_p["<<i<<"]["<<k<<"]["<<l<<"] = " <<k_p1[i][k][l]<<endl;
}
}
}
}
MPI_Finalize();
} // end of main
void Allocate_3D_R(long double***& m, int d1, int d2, int d3) {
m=new long double** [d1];
for (int i=0; i<d1; ++i) {
m[i]=new long double* [d2];
for (int j=0; j<d2; ++j) {
m[i][j]=new long double [d3];
for (int k=0; k<d3; ++k) {
m[i][j][k]=0.0;
}
}
}
}
Here is the output :
Processor 0: k_p[0][0][0] = 0
Processor 0: k_p[1][0][0] = 1
Processor 0: k_p[0][1][0] = 1
Processor 0: k_p[1][1][0] = 2
Processor 0: k_p[0][2][0] = 2
Processor 0: k_p[1][2][0] = 3
Processor 0: k_p[0][3][0] = 3
Processor 0: k_p[1][3][0] = 4
Processor 0: k_p[0][4][0] = 0
Processor 0: k_p[1][4][0] = 0
Processor 0: k_p[0][5][0] = 0
Processor 0: k_p[1][5][0] = 0
Processor 0: k_p[0][6][0] = 0
Processor 0: k_p[1][6][0] = 0
Processor 0: k_p[0][7][0] = 0
Processor 0: k_p[1][7][0] = 0
Processor 0: k_p[0][0][1] = 1
Processor 0: k_p[1][0][1] = 2
Processor 0: k_p[0][1][1] = 2
Processor 0: k_p[1][1][1] = 3
Processor 0: k_p[0][2][1] = 3
Processor 0: k_p[1][2][1] = 4
Processor 0: k_p[0][3][1] = 4
Processor 0: k_p[1][3][1] = 5
Processor 0: k_p[0][4][1] = 0
Processor 0: k_p[1][4][1] = 0
Processor 0: k_p[0][5][1] = 0
Processor 0: k_p[1][5][1] = 0
Processor 0: k_p[0][6][1] = 0
Processor 0: k_p[1][6][1] = 0
Processor 0: k_p[0][7][1] = 0
Processor 0: k_p[1][7][1] = 0
The data from the root processor is transferred but not from the other processor.
I tried using MPI_Send
and MPI_Recv
functions and did not encounter the above problem but for large values of for
loop it takes more time.
Therefore can anyone provide a solution to the above problem?