1

Help!

I am running my MPI code and return a runtime-error of "ONE OF THE PROCESS TERMINATED BADLY: CLEANING UP...process manager error waiting for completion", I want to figure out the number of the error process and how?

What's more, it can be ok when using 4X4(4 machine using 4 process each), but if I using 4X6 or more(4X8), there is a error.

My reduce code is below:

#include <stdio.h>
int main(void)
{
   int num,rank;
   scanf("%d %d",&num, &rank);
   int depth = 1;
   int flag = 0;
   while(num > 1) {
      if(rank < num){
          flag = num % 2;
          if(rank % 2 != 0){
              //MPI_Send(to (rank-1)*depth);
              printf("Send to %d\n", (rank - 1) * depth);
              rank *= num;
              break;
          }
          else{
              if(!(flag && (rank == (num - 1)))) {
                  //MPI_Recv(from (rank+1)*depth);
                  printf("Recv from %d\n", (rank+1)*depth);
              }
              rank /= 2;
          }
          depth *= 2;
      }
      num = num / 2 + flag;
  }
  return 0;
}

Thank you!

xunzhang
  • 2,838
  • 6
  • 27
  • 44

1 Answers1

0

If the problem is related to some MPI error, e.g. you try to send messages to ranks that does not exist, you should create your own MPI error handler using MPI_Comm_create_errhandler. Here you can print the number of the rank which produces the error. Nevertheless, you must run your code in a debugger to get behind the problem.

Thomas W.
  • 2,134
  • 2
  • 24
  • 46