MPI - Printing in an order

Question

I'm trying to write a function in C where every processor prints it's own data. Here is what i have:

void print_mesh(int p,int myid,int** U0,int X,int Y){
    int i,m,n;
    for(i=0;i<p;i++){
        if(myid==i){
            printf("myid=%d\n",myid);
            for(n=0;n<X;n++){
                for(m=0;m<Y;m++){
                    printf("%d ",U0[n][m]);
                }
                printf("\n");
            }
        }
        else MPI_Barrier(MPI_COMM_WORLD);
    }
}

It doesn't work for some reason. The arrays are printed all mixed up. Do you have any insight as to why this doesn't work? Any other ideas that work? If possible, I don't want to send the whole array in a master process. Also I don't want to use precompiled functions.

Wesley Bland is right; there's no general way to do it like this because of buffering. I use the same approach you do in example code here with small output all the time, and as a practical matter it generally works quite well, but there's no guarantee and it certainly won't work with sizeable amounts of output (> than a single I/O buffer). Best is to use MPI-IO to write to a file (eg, [this answer](http://stackoverflow.com/a/9810006/463827) ), with the same caveat that large amounts of data are best written in binary formats. — Jonathan Dursi, Jul 10 '13 at 14:35
Note also that you are calling `MPI_Barrier` within `MPI_COMM_WORLD` and in each round one rank in `MPI_COMM_WORLD` fails to call it. The call to `MPI_Barrier` should NOT be in the `else` block of the conditional construct, i.e. remove the `else` keyword. — Hristo Iliev, Jul 10 '13 at 16:30

score 7 · Accepted Answer · answered Jul 10 '13 at 13:18

There is no way to guarantee that messages from many different processes will arrive in the "correct" order when they arrive to another process. This is essentially what is happening here.

Even though you aren't explicitly sending messages, when you print something to the screen, it has to be sent to the process on your local system (mpiexec or mpirun) where it can be printed to the screen. There is no way for MPI to know what the correct order for these messages is so it just prints them as they arrive.

If you require that your messages are printed in a specific order, you must send them all to one rank which can print them in whatever order you like. As long as one rank does all of the printing, all of the messages will be ordered correctly.

It should be said that there will probably be answers that you can find out there which say you can put a newline at the end of your string or use flush() to ensure that the buffers are flushed, but that won't guarantee ordering on the remote end for the reasons mentioned above.

+1; `fflush(stdout)` is certainly worth trying, but yeah, there's no guarantee, and what works on one system may not on another. — Jonathan Dursi, Jul 10 '13 at 14:38

score 1 · Answer 2 · answered Dec 01 '16 at 18:16

So, you can do something like this:

MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0) {
    MPI_Send(&message, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
    printf("1 SIZE = %d RANK = %d MESSAGE = %d \n",size,rank, message);
} else {
    int buffer;
    MPI_Status status;
    MPI_Probe(MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &status);
    MPI_Get_count(&status, MPI_INT, &buffer);
    if (buffer == 1) {
        printf("2 SIZE = %d RANK = %d MESSAGE = %d \n",size,rank, message);
        MPI_Recv(&message, buffer, MPI_INT, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &status);
        if (rank + 1 != size) {
            MPI_Send(&message, 1, MPI_INT, ++rank, 0, MPI_COMM_WORLD);
        }
    };
};
MPI_Finalize();

After execute:

$ mpirun -n 5 ./a.out 
1 SIZE = 5 RANK = 0 MESSAGE = 999 
2 SIZE = 5 RANK = 1 MESSAGE = 999 
2 SIZE = 5 RANK = 2 MESSAGE = 999 
2 SIZE = 5 RANK = 3 MESSAGE = 999 
2 SIZE = 5 RANK = 4 MESSAGE = 999

Where do you assign 'message'? – EssentialAnonymity Sep 28 '17 at 22:34 — EssentialAnonymity, Sep 28 '17 at 22:34

Johannes Blaschke · Answer 3 · 2018-03-27T20:40:52.167

I was inspired by Святослав Павленко's answer: using the blocking MPI communications to enforce serial-in-time output. While Wesley Bland has a point about MPI not being built for serial output. So if we want to output data, it makes sense either have each processor output (non-colliding) data. Alternatively, if the order of the data is important (and it's not too big) the recommended approach is to send it all to on cpu (say rank 0), which then formats the data correctly.

To me, this seems to be a bit of overkill especially when the data can be variable-length strings, which all too often is what std::cout << "a=" << some_varible << " b=" << some_other_variable often is. So if we want some quick-and-dirty in-order printing, we can exploit Святослав Павленко's answer to build a serial output stream. This solution works fine, but its performance scales badly with many cpus, so don't use it of data output!

#include <iostream>
#include <sstream>
#include <mpi.h>

MPI House-keeping:

int mpi_size;
int mpi_rank;

void init_mpi(int argc, char * argv[]) {
    MPI_Init(& argc, & argv);
    MPI_Comm_size(MPI_COMM_WORLD, & mpi_size);
    MPI_Comm_rank(MPI_COMM_WORLD, & mpi_rank);
}

void finalize_mpi() {
    MPI_Finalize();
}

General-purpose class which enables MPI message-chaining

template<class T, MPI_Datatype MPI_T> class MPIChain{
    // Uses a chained MPI message (T) to coordinate serial execution of code (the content of the message is irrelevant).
    private:
        T message_out; // The messages aren't really used here
        T message_in;
        int size;
        int rank;

    public:
        void next(){
            // Send message to next core (if there is one)
            if(rank + 1 < size) {
            // MPI_Send - Performs a standard-mode blocking send.
            MPI_Send(& message_out, 1, MPI_T, rank + 1, 0, MPI_COMM_WORLD);
            }
        }

        void wait(int & msg_count) {
            // Waits for message to arrive. Message is well-formed if msg_count = 1
            MPI_Status status;

            // MPI_Probe - Blocking test for a message.
            MPI_Probe(MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, & status);
            // MPI_Get_count - Gets the number of top level elements.
            MPI_Get_count(& status, MPI_T, & msg_count);

            if(msg_count == 1) {
                // MPI_Recv - Performs a standard-mode blocking receive.
                MPI_Recv(& message_in, msg_count, MPI_T, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, & status);
            }
        }

        MPIChain(T message_init, int c_rank, int c_size): message_out(message_init), size(c_size), rank(c_rank) {}

        int get_rank() const { return rank;}
        int get_size() const { return size;}
};

We can now use our MPIChain class to create our class which manages to output stream:

class ChainStream : public MPIChain<int, MPI_INT> {
    // Uses the MPIChain class to implement a ostream with a serial operator<< implementation.
    private:
        std::ostream & s_out;

    public:
        ChainStream(std::ostream & os, int c_rank, int c_size)
            : MPIChain<int, MPI_INT>(0, c_rank, c_size), s_out(os) {};

        ChainStream & operator<<(const std::string & os){
            if(this->get_rank() == 0) {
                this->s_out << os;
                // Initiate chain of MPI messages
                this->next();
            } else {
                int msg_count;
                // Wait untill a message arrives (MPIChain::wait uses a blocking test)
                this->wait(msg_count);
                if(msg_count == 1) {
                    // If the message is well-formed (i.e. only one message is recieved): output string
                    this->s_out << os;
                    // Pass onto the next member of the chain (if there is one)
                    this->next();
                }
            }

            // Ensure that the chain is resolved before returning the stream
            MPI_Barrier(MPI_COMM_WORLD);

            // Don't output the ostream! That would break the serial-in-time exuction.
            return *this;
       };
};

Note the MPI_Barrier at the end of operator<<. This is to prevent the code starting a second output chain. Even though this could be moved outside the operator<<, I figured that I would put it here, since this is supposed to be serial output anyway....

Putting it all together:

int main(int argc, char * argv[]) {
    init_mpi(argc, argv);

    ChainStream cs(std::cout, mpi_rank, mpi_size);

    std::stringstream str_1, str_2, str_3;
    str_1 << "FIRST:  " << "MPI_SIZE = " << mpi_size << " RANK = " << mpi_rank << std::endl;
    str_2 << "SECOND: " << "MPI_SIZE = " << mpi_size << " RANK = " << mpi_rank << std::endl;
    str_3 << "THIRD:  " << "MPI_SIZE = " << mpi_size << " RANK = " << mpi_rank << std::endl;

    cs << str_1.str() << str_2.str() << str_3.str();
    // Equivalent to:
    //cs << str_1.str();
    //cs << str_2.str();
    //cs << str_3.str();

    finalize_mpi();
}

Note that we are concatenating the strings str_1, str_2, str_3 before we send them the the ChainStream instance. Normally one would do something like:

std::cout << "a" << "b" << "c"" << std::endl

but this applies operator<< from left-to-right, and we want the strings to be ready for output before sequentially running through each process.

g++-7 -O3 -lmpi serial_io_obj.cpp -o serial_io_obj
mpirun -n 10 ./serial_io_obj

Outputs:

FIRST:  MPI_SIZE = 10 RANK = 0
FIRST:  MPI_SIZE = 10 RANK = 1
FIRST:  MPI_SIZE = 10 RANK = 2
FIRST:  MPI_SIZE = 10 RANK = 3
FIRST:  MPI_SIZE = 10 RANK = 4
FIRST:  MPI_SIZE = 10 RANK = 5
FIRST:  MPI_SIZE = 10 RANK = 6
FIRST:  MPI_SIZE = 10 RANK = 7
FIRST:  MPI_SIZE = 10 RANK = 8
FIRST:  MPI_SIZE = 10 RANK = 9
SECOND: MPI_SIZE = 10 RANK = 0
SECOND: MPI_SIZE = 10 RANK = 1
SECOND: MPI_SIZE = 10 RANK = 2
SECOND: MPI_SIZE = 10 RANK = 3
SECOND: MPI_SIZE = 10 RANK = 4
SECOND: MPI_SIZE = 10 RANK = 5
SECOND: MPI_SIZE = 10 RANK = 6
SECOND: MPI_SIZE = 10 RANK = 7
SECOND: MPI_SIZE = 10 RANK = 8
SECOND: MPI_SIZE = 10 RANK = 9
THIRD:  MPI_SIZE = 10 RANK = 0
THIRD:  MPI_SIZE = 10 RANK = 1
THIRD:  MPI_SIZE = 10 RANK = 2
THIRD:  MPI_SIZE = 10 RANK = 3
THIRD:  MPI_SIZE = 10 RANK = 4
THIRD:  MPI_SIZE = 10 RANK = 5
THIRD:  MPI_SIZE = 10 RANK = 6
THIRD:  MPI_SIZE = 10 RANK = 7
THIRD:  MPI_SIZE = 10 RANK = 8
THIRD:  MPI_SIZE = 10 RANK = 9

score 0 · Answer 4 · answered Jul 10 '13 at 15:42

The MPI standard doesn't specify how stdout from different nodes should be collected and fflush doesn't help.

If you need to print big outputs in order, probably the best solution is not to gather them all and print at once, because this will generate traffic over the network. A better solution is to create something similar to a virtual ring where each process waits a token from the previous process, prints and sends the token to the next one. Of course the first process doesn't have to wait, it prints and send to the next one.

Anyway in case of really big output, where probably there is no sense to print outputs on video, you should use MPI-IO as suggested by Jonathan Dursi.

score 0 · Answer 5 · answered Nov 28 '20 at 22:19

For debugging and development purposes, you can run each process in a separate terminal, so they print in their own terminal:

mpirun -np n xterm -hold -e ./output

n: number of processors
-hold: keeps xterm on after the program is done.
output: name of MPI executable

score -1 · Answer 6 · edited Jun 25 '21 at 07:33

-1

In C++, I have used printing once in a given rank and it prints orderly disjointed display

cout<<"The capabilities of Node "<<node_name<<" are: \n";

cout<<"no of real cores = "<<rcores<< " \n";
cout<<"no of virtual cores = "<<vcores<<" \n";
cout<<"clock speed of Processor = "<<speed<<" MHz \n";
cout<<"RAM size is "<<ramsize<<"\n"<<endl;

output is screenshot of output

edited Jun 25 '21 at 07:33

Alex Guteniev

12,039
2
34
79

answered Jun 24 '21 at 02:03

David bhatt

1

MPI - Printing in an order

6 Answers6

Linked

Related