4

I am trying to run a multi-node jobs with aprun. However, I couldn't figure out how to get the rank (or whatever that serves as the ID of each job) in bash environment. Like this simple job:

aprun -n 8 -N 2 ./examplebashscript.sh

How can I get the rank in each spawned jobs? Without something like a rank or any unique job ID, this aprun line will only run the exact same program 16 times, which is undesirable.

I've been reading on the documentation, surprisingly I couldn't find anything that even explains the default variables provided by aprun.

I've worked with mpirun before, which I know how to get the rank values of each jobs using C and Python programs, but not in Bash. aprun is even less documented.

Brian Campbell
  • 322,767
  • 57
  • 360
  • 340
  • I am not at all familiar with `aprun`, and you're right, from looking at it, the documentation is not very good. But one thing that I would try would be just dumping the environment variables using `env` to a file somewhere, and seeing if the information is passed in via environment variables. You could use something like `env > $(hostname)-$$.env` to write out to a file named based on the hostname and PID of the process running, to hopefully get separate results per invocation. – Brian Campbell Mar 13 '15 at 18:48
  • I've just tried it, unfortunately I don't see anything close to what I need. There are some SLURM variables (like SLURM_NNODES, SLURM_JOBID), which are same in all the jobs. Therefore I need someone to shed some light on exactly how to run unique jobs for aprun. – user4668442 Mar 13 '15 at 19:14

3 Answers3

1

One way of doing this that may work is to write a wrapper script that can take a list of tasks to run and then spawns each of these to a separate script.

In your fragment it looks like you want to run 2 instances of the script per compute node to get 8 in total so, in your job script you could do something like:

for (( i=0; i<8; i+=2 )); do
   aprun -n 1 ./wrapper.sh $i 2 &
done
wait

then in wrapper you could do something like (where $j gives you your unique index):

end=$(( $1 + $2 ))
for (( j=$1; j<$end; j+=1 )); do
   ./examplebashscript.sh $j &
done
wait

You can also set the following environment variables to get the placement of the various processes and threads. You need to set these in your shell (or job script) before you run "aprun":

export MPICH_CPUMASK_DISPLAY=1
export MPICH_RANK_REORDER_DISPLAY=1

For example, running:

aprun -n 24 ./examplebashscript.sh

(the shorthand equivalent of):

aprun -n 24 -N 24 -S 12 -d 1 ./examplebashscript.sh

will give you output of the following type on STDERR (note this is on an XC30 with two Intel Ivy Bridge 12-core processors per compute node so the mask shows placement on 48 cores per node due to the presence of hyperthreads):

[PE_0]: MPI rank order: Using default aprun rank ordering.
[PE_0]: rank 0 is on nid02749
[PE_0]: rank 1 is on nid02749
[PE_0]: rank 2 is on nid02749
[PE_0]: rank 3 is on nid02749
[PE_0]: rank 4 is on nid02749
[PE_0]: rank 5 is on nid02749
[PE_0]: rank 6 is on nid02749
[PE_0]: rank 7 is on nid02749
[PE_0]: rank 8 is on nid02749
[PE_0]: rank 9 is on nid02749
[PE_0]: rank 10 is on nid02749
[PE_0]: rank 11 is on nid02749
[PE_0]: rank 12 is on nid02749
[PE_0]: rank 13 is on nid02749
[PE_0]: rank 14 is on nid02749
[PE_0]: rank 15 is on nid02749
[PE_0]: rank 16 is on nid02749
[PE_0]: rank 17 is on nid02749
[PE_0]: rank 18 is on nid02749
[PE_0]: rank 19 is on nid02749
[PE_0]: rank 20 is on nid02749
[PE_0]: rank 21 is on nid02749
[PE_0]: rank 22 is on nid02749
[PE_0]: rank 23 is on nid02749
[PE_23]: cpumask set to 1 cpu on nid02749, cpumask = 000000000000000000000000100000000000000000000000
[PE_22]: cpumask set to 1 cpu on nid02749, cpumask = 000000000000000000000000010000000000000000000000
[PE_21]: cpumask set to 1 cpu on nid02749, cpumask = 000000000000000000000000001000000000000000000000
[PE_0]: cpumask set to 1 cpu on nid02749, cpumask = 000000000000000000000000000000000000000000000001
[PE_20]: cpumask set to 1 cpu on nid02749, cpumask = 000000000000000000000000000100000000000000000000
[PE_9]: cpumask set to 1 cpu on nid02749, cpumask = 000000000000000000000000000000000000001000000000
[PE_11]: cpumask set to 1 cpu on nid02749, cpumask = 000000000000000000000000000000000000100000000000
[PE_10]: cpumask set to 1 cpu on nid02749, cpumask = 000000000000000000000000000000000000010000000000
[PE_8]: cpumask set to 1 cpu on nid02749, cpumask = 000000000000000000000000000000000000000100000000
[PE_1]: cpumask set to 1 cpu on nid02749, cpumask = 000000000000000000000000000000000000000000000010
[PE_2]: cpumask set to 1 cpu on nid02749, cpumask = 000000000000000000000000000000000000000000000100
[PE_18]: cpumask set to 1 cpu on nid02749, cpumask = 000000000000000000000000000001000000000000000000
[PE_7]: cpumask set to 1 cpu on nid02749, cpumask = 000000000000000000000000000000000000000010000000
[PE_15]: cpumask set to 1 cpu on nid02749, cpumask = 000000000000000000000000000000001000000000000000
[PE_3]: cpumask set to 1 cpu on nid02749, cpumask = 000000000000000000000000000000000000000000001000
[PE_6]: cpumask set to 1 cpu on nid02749, cpumask = 000000000000000000000000000000000000000001000000
[PE_16]: cpumask set to 1 cpu on nid02749, cpumask = 000000000000000000000000000000010000000000000000
[PE_14]: cpumask set to 1 cpu on nid02749, cpumask = 000000000000000000000000000000000100000000000000
[PE_13]: cpumask set to 1 cpu on nid02749, cpumask = 000000000000000000000000000000000010000000000000
[PE_12]: cpumask set to 1 cpu on nid02749, cpumask = 000000000000000000000000000000000001000000000000
[PE_4]: cpumask set to 1 cpu on nid02749, cpumask = 000000000000000000000000000000000000000000010000
[PE_5]: cpumask set to 1 cpu on nid02749, cpumask = 000000000000000000000000000000000000000000100000
[PE_17]: cpumask set to 1 cpu on nid02749, cpumask = 000000000000000000000000000000100000000000000000
[PE_19]: cpumask set to 1 cpu on nid02749, cpumask = 000000000000000000000000000010000000000000000000

You may be able to capture this somehow to use.

AndyT
  • 491
  • 2
  • 10
1

Try looking for environment variable ALPS_APP_PE in the bash script that you have aprun-ed.

It will be different for each instance of the script (number of instances created is given by the -n option in the aprun command).

If the script subsequently executes an instance of the MPI program, that instance will have MPI rank value given by ALPS_APP_PE.

The caveat is that some Cray sites may decide not to expose this variable, or to use a different name. Very old ALPS versions also don't support it, but these are rare.

See this CUG 2014 paper for an example:

https://cug.org/proceedings/cug2014_proceedings/includes/files/pap136.pdf

ahart
  • 86
  • 3
0

Assuming that you're running on a recent Cray, you can't. Your script executes on the login nodes and the aprun command launches the application on the compute node.

Your launched application can get the rank by initialising MPI and then calling MPI_Rank.

Rupert Nash
  • 1,360
  • 12
  • 20