4

I'm trying to gracefully exit my program after if Rdinput returns an error.

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

#define MASTER 0
#define Abort(x) MPI_Abort(MPI_COMM_WORLD, x)
#define Bcast(send_data, count, type) MPI_Bcast(send_data, count, type, MASTER, GROUP) //root --> MASTER
#define Finalize() MPI_Finalize()

int main(int argc, char **argv){

  //Code

  if( rank == MASTER ) {
    time (&start);
    printf("Initialized at %s\n", ctime (&start) );      
    //Read file
    error = RdInput();
  }

  Bcast(&error, 1, INT); Wait();

  if( error = 1 ) MPI_Abort(1);

  //Code

  Finalize();
}

Program output:

mpirun -np 2 code.x 
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD 
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
Initialized at Wed May 30 11:34:46 2012
Error [RdInput]: The file "input.mga" is not available!
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 7369 on
node einstein exiting improperly. There are two reasons this could occur:

//More error message.

What can I do to gracefully exit an MPI program without printing this huge error message?

Fabricio
  • 343
  • 1
  • 5
  • 11
  • 3
    Where have you hidden mpi_init ? – High Performance Mark May 30 '12 at 15:00
  • 1
    All the code you have shared is irrelevant to the problem. All the code that is relevant to the problem is missing. We don't know how you initialize or finalize MPI, and we don't even know where `rank` is defined, or how you populate it. Right now, it looks like MPI is never even initialized. – ArjunShankar May 30 '12 at 15:06
  • 2
    A totally off topic suggestion: Consider dropping the large banner. You could have a switch like `--help` and include the list of authors when you print help information, or you could include the list of authors in the `man` page for your program. Somebody may want to use the output of your program in a script, and the banner will be a pain in the neck then. - As an example, imagine what would happen if the `cat` program started by printing a banner whenever you ask it to `cat` a file. – ArjunShankar May 30 '12 at 15:10
  • 1
    The banner is pretty common at the start of scientific software, particularly if it prints version numbers or details of the run so that that information is then stored in the program output. – Jonathan Dursi May 30 '12 at 15:50

1 Answers1

19

If you have this logic in your code:

Bcast(&error, 1, INT);
if( error = 1 ) MPI_Abort(1); 

then you're just about done (although you don't need any kind of wait after a broadcast). The trick, as you've discovered, is that MPI_Abort() does not do "graceful"; it basically is there to shut things down in whatever way possible when something's gone horribly wrong.

In this case, since now everyone agrees on the error code after the broadcast, just do a graceful end of your program:

   MPI_Bcast(&error, 1, MPI_INT, MASTER, MPI_COMM_WORLD);
   if (error != 0) {
       if (rank == 0) {
           fprintf(stderr, "Error: Program terminated with error code %d\n", error);
       }
       MPI_Finalize();
       exit(error);
   } 

It's an error to call MPI_Finalize() and keep on going with more MPI stuff, but that's not what you're doing here, so you're fine.

Jonathan Dursi
  • 50,107
  • 9
  • 127
  • 158
  • 1
    Thanks, it works, but it returns "mpirun noticed that the job aborted, but has no info as to the process that caused that situation." i thinks this is MPI default msg, i'm use Open MPI – Fabricio May 31 '12 at 11:54
  • In this case, not all the MPI processes got to the Finalize/exit at the same time elsewhere. It shouldn't be necessary, but put a barrier after the broadcast and see if it's just a matter of being a bit out of sync or of some processes really are stuck somewhere. – Jonathan Dursi May 31 '12 at 12:13
  • There should be a semicolon at the end of the fprintf line. – JC1 Jun 20 '17 at 18:26
  • Thanks, @JC1. Fixed. – Jonathan Dursi Jun 20 '17 at 18:58
  • @Fabricio, does this really answer the question? That is "exit an MPI program [with error code] without *printing this huge error message*." – alfC Nov 08 '21 at 02:32