It is not specified in MPI standard how to order the stdout from all the nodes. i.e The order to print is not defined (see).
Sometimes, when you try printing a large string, you can even observe that the output from different ranks are merged and printed. To summarize, if you need to print in order from rank 0
to rank N
, you may have to write extra lines of code. Rank 0
always printing at last in your example is also not deterministic. It will vary.
Below is an excerpt from the book An Introduction to Parallel Programming by Peter Pacheco about I/O in MPI.
Although the MPI standard doesn’t specify which processes have access
to which I/O devices, virtually all MPI implementations allow all the processes in
MPI COMM WORLD full access to stdout and stderr, so most MPI implementations
allow all processes to execute printf and fprintf(stderr, ...).
However, most MPI implementations don’t provide any automatic scheduling of
access to these devices. That is, if multiple processes are attempting to write to, say, stdout, the order in which the processes’ output appears will be unpredictable.
Indeed, it can even happen that the output of one process will be interrupted by the
output of another process. The reason this happens is that the MPI processes are “competing” for access to
the shared output device, stdout, and it’s impossible to predict the order in which the
processes’ output will be queued up. Such a competition results in nondeterminism.
That is, the actual output will vary from one run to the next. See Page 98 for more.