I am running a simulation on a Linux machine with the following specs.
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Stepping: 4
CPU MHz: 3099.902
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
BogoMIPS: 4800.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 28160K
This is the run command line script for my solver.
/path/to/meshfree/installation/folder/meshfree_run.sh # on 1 (serial) worker
/path/to/meshfree/installation/folder/meshfree_run.sh N # on N parallel MPI processes
I share the system with another colleague of mine. He uses 10 cores for his solution. What would be the fastest option for me in this case? Using 30 MPI processes?
I am a Mechanical Engineer with very little knowledge on parallel computing. So please excuse me if the question is too stupid.