What are the key differences between general-purpose processors and HPC processors? I am referring to the CPU organization parts that influence the performance of my program such as memory bandwidth, the maximum number of parallel loads, the maximum number of parallel stores, etc. Any links to the outside sites are also welcome.
Asked
Active
Viewed 100 times
1 Answers
1
High Performance Computing clusters (almost?) invariably use off-the-shelf CPUs like Intel Xeon or AMD Epyc, or IBM POWER, exactly the same as for other server roles (like database server).
HPC clusters will tend to add in low-latency / high-bandwidth interconnects like Infiniband, rather than "just" 10G ethernet, and also compute cards (based on GPU architectures) for the real numeric heavy lifting.
That's why Intel and AMD don't sell CPUs with fewer FMA units for the database-server role, just always the same core. (Although Skylake-server actually is available with one vs. two 512-bit FMA units, but that's the first time in many generations of CPU that there's an option like that.)

Peter Cordes
- 328,167
- 45
- 605
- 847
-
I ask this question because I was parallelizing an application using OpenMP. The application had a lot of cache misses and on my system with 16 threads I got 2.7 speed improvement, and on an Intel Xeon HPC system with the same number of threads I got about 7x speed improvement – Bogi Feb 11 '21 at 18:19
-
@Bogi: Is your system an Intel Coffee Lake desktop or something? Probably [Why is Skylake so much better than Broadwell-E for single-threaded memory throughput?](https://stackoverflow.com/q/39260020) - a Xeon (especially a big Xeon and especially a multi-socket system) has high aggregate bandwidth, but lower single-threaded memory bandwidth than a "client" chip like a desktop. Also, if the Xeon is 16 *physical* cores (and more logical cores), then 16 threads can each run on their own physical core, instead of sharing two logical cores of the same physical on an 8c16t desktop. – Peter Cordes Feb 11 '21 at 20:37