0

I'm running an OpenFOAM simulation on a cluster. I have used the Scotch decomposition method and my decomposeParDict looks like this:

FoamFile
{
    version     2.0;
    format      ascii;
    class       dictionary;
    object      decomposeParDict;
}
numberOfSubdomains 6;
method          scotch;

checkMesh and decomposePar finish with no issues. I have assigned 6 nodes to the slurm by

srun -N6 -l sonicFoam

and the solver runs smoothly without any errors.

The issue is the solution speed is not improved in comparison to the non-parallel simulation I ran before. I want to monitor the CPU usage to see if all of the 6 nodes I have assigned are similarly loaded. The squeue --user=foobar command return the jobNumber and list of nodes assigned (NODELIST(REASON)) which looks like this:

foo,bar[061-065]

from sinfo command these nodes are both in debug and main* PARTITIONs (which I have absolutely no idea what it means!).

This post says that you can use the sacct or sstat commands to monitor CPU time and memory usage of a slurm job. But when I run

sacct --format="CPUTime,MaxRSS"

it gives me:

 CPUTime     MaxRSS
---------- ----------
  00:00:00
  00:00:00
  00:07:36
  00:00:56
  00:00:26
  15:26:24

which I can not understand. And when I specify the job number by

sacct --job=<jobNumber> --format="UserCPU"

The return is empty. So my questions are

  • Is my simulation loading all nodes or is it running on one or two and the rest are free?
  • am I running the right commands? if yes what those numbers mean? how they represent the CPU usage per node?
  • If not then what are the right --format="..."s for sacct and/or sstat (or maybe other slurm commands) to get the CPU usage/load?

P.S.1. I have followed the OpenFOAM compiling following the official instructions. I did not do anything with OpenMPI and it's mpicc compiler for that matter though.

P.S.2 For those of you who might end up here. Maybe I'm running the wrong command apparently one can first allocate some resources by:

srun -N 1 --ntasks-per-node=7 --pty bash

where 7 is the number of cores you want and bash is just a name. and then run the solver with:

mpirun -np 7 sonicFoam -parallel -fileHandler uncollated

I'm not sure yet though.

Foad S. Farimani
  • 12,396
  • 15
  • 78
  • 193

2 Answers2

2

You can use

sacct --format='jobid,AveCPU,MinCPU,MinCPUTask,MinCPUNode'

to check whether all CPUs have been active. Compare AveCPU (average CPU time of all tasks in job) with MinCPU (minimum CPU time of all tasks in job). If they are equal, all 6 tasks (you requested 6 nodes, with, implicitly, 1 task per node) worked equally. If they are not equal, or even MinCPU is zero, then some tasks have been doing nothing.

But in your case, I believe you will observe that all tasks have been working hard, but they were all doing the same thing.

Besides the remark concerning the -parallel flag by @timdykes, you also must be aware that launching an MPI job with sun requires that OpenMPI was compiled with Slurm support. During your installation of OpenFOAM, it installed its own version of OpenMPI, and if file /usr/include/slurm/slurm.h or /usr/include/slurm.h exists, then Slurm support was probably compiled in. But the safest is probably to use mpirun.

But to do that, you will have to first request an allocation from Slurm with either sbatch or salloc.

damienfrancois
  • 52,978
  • 9
  • 96
  • 110
  • Thanks for the post. I tried the `sacct --format='jobid,AveCPU,MinCPU,MinCPUTask,MinCPUNode'` and all the fields are empty! would you please elaborate on 1. how to make sure the OpenMPI on my systems was compiled with Slurm? you may note that I don't have admin privileges of course. 2. I checked and `/usr/include/slurm/slurm.h` exists but I don't think it was installed with my OpenFOAM. 3. what is the correct way of running `mpirun` with `sbatch` and salloc? can you give an example please? – Foad S. Farimani Feb 02 '18 at 09:49
  • 1
    I might mean that accounting was not configured in Slurm. What is the output of `scontrol show config | grep JobAcctGatherType`? – damienfrancois Feb 02 '18 at 09:52
  • 1
    1. OpenFOAM does not use the pre-installed OpenMPI, it comes with its own version, that gets compiled when you run `./Allwmake `. So somewhere, in your OpenFOAM directory, must be an OpenMPI directory, and a `config.log` with details of how OpenMPI was compiled. – damienfrancois Feb 02 '18 at 09:54
  • `scontrol show config | grep JobAcctGatherType` --> `JobAcctGatherType = jobacct_gather/none` – Foad S. Farimani Feb 02 '18 at 09:55
  • 1
    2. No indeed it was not installed with your OpenFOAM, it was installed with Slurm. But your OpenFOAM must have picked it when being compiled. – damienfrancois Feb 02 '18 at 09:55
  • 1
    3. Interactive option: `salloc -N6 mpirun sonicFoam`. Batch option: `sbatch -N6 --wrap "mpirun sonicFoam"` – damienfrancois Feb 02 '18 at 09:57
  • sorry if my questions seem premivitive. I am very new to the whole slurm/MPI/OpenFOAM world. Should I compile my OpenFOAM again to be sure it is using the preinstalled slurm-OpenMPI? how should I do that?! – Foad S. Farimani Feb 02 '18 at 09:58
  • 1
    `jobacct_gather/none` means the information is not collected I am afraid. – damienfrancois Feb 02 '18 at 09:58
  • 1
    No, it is preferable to use the OpenFOAM-provided OpenMPI. Stick to using `mpirun` rather than `srun` and it should be fine. – damienfrancois Feb 02 '18 at 09:59
  • in `salloc -N6 mpirun sonicFoam` what is the `` entry? – Foad S. Farimani Feb 02 '18 at 10:01
  • 1
    it means hit the Enter key – damienfrancois Feb 02 '18 at 10:04
  • I'm trying the `sbatch -N6 --wrap "mpirun sonicFoam"` it runs, but still no way to tell if I'm using all the nodes/CPUs equally! – Foad S. Farimani Feb 02 '18 at 10:07
  • 1
    For that, you will have to peek into the OpenFAOM output logs and see whether it split the work or not. But that is out of my area of expertise I am afraid – damienfrancois Feb 02 '18 at 10:10
1

Have you tried running with the '-parallel' argument? All of the OpenFOAM examples online use this argument when running a parallel job, one example is the official guide for running in parallel.

srun -N $NTASKS -l sonicFOAM -parallel

As an aside - I saw you built openfoam yourself, have you checked whether the cluster admins have provided a module for it? You can usually run module avail to see a list of the available modules, and then module load moduleName if there is an existing OpenFOAM module. This is useful as you can probably trust its been built with all the right options and would automatically set up your $PATH etc.

timdykes
  • 610
  • 6
  • 12
  • thanks for the post: 1. I tried the `module avail` and there are no OpenFOAM modules available. Not expected. this server is meant for a neural network guys and I'm kinda abusing it for CFD :) 2. I was not aware of the `-parallel` flag as it was not mentioned in [the official page for OpenFOAM parallelization](https://cfd.direct/openfoam/user-guide/running-applications-parallel/). I will try it now. 3. [Here](https://www.youtube.com/watch?v=bHMdh_l45M0) I see that I must run the code with `mpirun`. I am gonna try that too. but still no way to compare them as I can't monitor CPUs. – Foad S. Farimani Feb 02 '18 at 09:06
  • I checked: 1. `mpirun -np 6 sonicFoam -parallel` works but still no way to tell if it is harvesting all 6 nodes/CPUS. 2. A little weird but this also runs `srun -N6 -l mpirun -np 6 sonicFoam -parallel` also runs! 3. `srun -N6 -l sonicFoam -parallel` leads to a very long cryptic error which I can not comprehend. maybe I can log it and post into a new question. – Foad S. Farimani Feb 02 '18 at 09:38
  • 1
    In regards to 2. while the argument isnt mentioned explicitly in the text, you can see it used in the example in 3.2.3, and a brief google search for 'OpenFOAM example slurm' shows many examples all of which use the -parallel flag. For 3. You need to use whichever MPI launch wrapper is appropriate for your machine, if it is a cluster with SLURM (looks like it) then srun is probably the most appropriate command. If not sure, you should check with your administators (probably you have a 'Getting Started' guide for the cluster that explains the appropriate usage for MPI jobs?). – timdykes Feb 02 '18 at 09:39
  • I have not gogled `OpenFOAM` and `slurm` together yet. gonna do that now. and come back with the results. thanks for the hint. – Foad S. Farimani Feb 02 '18 at 10:03