I see a lot of threads asking about situations with elapsed time (wall time) being less than user+kernel time, and I understand how multi-threading can cause this situation. However, when timing an execution of some MPI code via:
$ time mpirun -n 4 ./a.out
I'm seeing elapsed times that range from 4-5 minutes, user times of about 40 seconds, and kernel times of about 40 seconds. I'm thinking that barrier synchronization between processes could be part of the cause, or perhaps time only getting information about a single MPI process, but I'm still not able to rationalize exactly what is causing my readings. Can anyone explain that?
Thanks very much.