0

I met an issue when I want to send a very large size of message through MPI_Send: there are multiple processors and the total number of int we need to transfer is 2^25, I test that size in 1000 my code works well, but if I set it to the size professor asked, it will stuck for a long time and return me some information like this:

 2 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics

Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.

mpiexec noticed that process rank 0 with PID 0 on node srv-p22-13 exited on signal 24 (CPU time limit exceeded).

I used "cout" after each line of code, and I am sure it stuck before the MPI_Send line, the size of Si is over 20,000,000. I am not sure is that the reason? But I have searched that the max limited of MPI_Send is 2^32-1...It is larger that 2^25...So I am feeling confusion.

this is the main part of my code:

//This is send part
    for(int i=0; i<5; i++){
        if(i!=my_rank){//my_rank is from MPI_Comm_rank(MPI_COMM_WORLD, &my_rank)
            int n = A.size();//A is a vector of int
            int* Si= new int[n];//I want to convert vector to a int array
            std::copy(A.begin(),A.end(),Si);
            MPI_Send(&Si, n, Type, i, my_rank ,MPI_COMM_WORLD);//**The code stuck here and says CPU time limit exceeded
            delete[] Si;
        }
    }
    MPI_Barrier(MPI_COMM_WORLD);//I want all the processor finish sending part, then start receive and save in vector

//This is receive part
    for(int i=0; i<5; i++){
        if(i!=my_rank){
            MPI_Status status;
            MPI_Probe(i,i,MPI_COMM_WORLD,&status);
            int rn = 0;
            MPI_Get_count(&status, Type, &rn);
            int* Ri = new int[rn];
            MPI_Recv(Ri, rn, Type, i, i, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
            /*Save received elements into vector A*/
            for(int i=0; i<sizeof(Ri);i++){
                inout.push_back(A);
            }

        }
    }
Konan
  • 37
  • 7
  • @rustyx the size=5. Thank you for mention, I will add the necessary value. – Konan Nov 07 '19 at 22:45
  • Could you post more code, particularly the part that contains the receive? It may be that for short arrays MPI uses the *buffered* send, which hides some deadlock. With larger arrays the *synchronous* send is used, which then blocks the program due to wrong order of send/receive calls. – jacob Nov 07 '19 at 23:55
  • @jacob Thanks for your advice, I am new to Stack and I saw it said to not paste the entire code so I think maybe only post the part has issue is ok...But you are correct, maybe the mistake is based on the structor, I already added the receive part. – Konan Nov 08 '19 at 00:20
  • @jacob, I just take a look and it looks similar, thank you for sharing this reference, I will go through it and see if that can solve my problem. Thank you very much! – Konan Nov 08 '19 at 00:49
  • @jacob That really solve my problem, thank you!!! – Konan Nov 08 '19 at 01:16

1 Answers1

0

Thank you so much to @jacob for sharing me a similar question link, after reading it, I know I made a same mistake that: processors cannot ALL SEND in a same time, so I use MPI_Sendrecv by reference this question:MPI hangs on MPI_Send for large messages

Konan
  • 37
  • 7