I'm trying to run the intel version of the HPL benchmark here and I'm a bit confused by the options.
What I want to do (for now) is a single-node run. The node has 2x Xeon Platinum 8276 processors, so 56 cores total. So my PxQ should be 56.
However the intel docs say:
- MPI_PROC_NUM should be equal to PxQ (i.e 56) - this gets passed to
mpirun -np
- MPI_PER_NODE should be equal to the number of sockets in the system (i.e. 2) - this gets passed to
mpirun -perhost
To me those don't seem consistent? And how does using OMP_NUM_THREADS fit into this?