I'm confused about how multiple launches of same python command bind to cores on a NUMA Xeon machine.
I read that OMP_NUM_THREADS
env var sets the number of threads launched for a numactl
process. So if I ran numactl --physcpubind=4-7 --membind=0 python -u test.py
with OMP_NUM_THREADS=4
on a hyperthreaded HT machine (lscpu output below) it'd limit the this numactl process to 4 threads.
But since machine has HT, it's not clear to me if 4-7
in the above are 4 physical or 4 logical.
How to find which of the numa-node-0 cores in
0-23,96-119
are physical and which ones logical? Are96-119
all logical or are they interspersed?If
4-7
are all physical cores, then with HT on there would be only 2 physical cores needed, so what happens to the other 2?Where is OpenMP library getting invoked in binding threads to physical cores?
(from my limited understanding I could just launch command python main.py
in a sh
shell 20 times with different numactl bindings and OMP_NUM_THREADS still applies, even though I didn't explicitly use MPI lib anywhere, is that correct?)
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 2
Core(s) per socket: 48
Socket(s): 2
NUMA node(s): 4
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz
Stepping: 7
Frequency boost: enabled
CPU MHz: 1000.026
CPU max MHz: 2301,0000
CPU min MHz: 1000,0000
BogoMIPS: 4600.00
L1d cache: 3 MiB
L1i cache: 3 MiB
L2 cache: 96 MiB
L3 cache: 143 MiB
NUMA node0 CPU(s): 0-23,96-119
NUMA node1 CPU(s): 24-47,120-143
NUMA node2 CPU(s): 48-71,144-167
NUMA node3 CPU(s): 72-95,168-191