0

I am looking for a benchmark which measures the computing capability of each core in my system (a supercomputer). In the other word, I want to find the realistically achievable maximum floating-point operations per second in one processor. I found a benchmark named SPEC that might be useful for me but it's not for free. Could you suggest me any suitable benchmark for this aim?

Any help would be appreciated.

Matrix
  • 2,399
  • 5
  • 28
  • 53
  • Realistically achievable GFLOP/s for what kind of workload? I think if you avoid memory bottlenecks, carefully tuned code can achieve the theoretical max of two 256b vector FMA instructions per clock (on each core), for Intel Haswell and later. A good matrix multiply implementation should come very close to this, IIRC. Different workloads will be vastly different. SPECfp is a bunch of specific FP workloads. IDK if any of them bottleneck on pure GFLOP/s throughput in a way that would let the compiler auto-vectorize to something that would saturate the FMA units, but I highly doubt it. – Peter Cordes Sep 04 '16 at 12:44
  • @PeterCordes thanks for the comment. I would use the intel modules. But how could you get that amount? Is there any benchmark app that I can use? I want to get this amount without memory bandwidth impact – Matrix Sep 04 '16 at 12:55
  • Just calculate the theoretical max from 8 single-precision floats/vector * 2 vector-FMAs/clock * 2 FLOPs/FMA * clock speed. e.g. 96GFLOP/s per physical core for a 3GHz Haswell/Broadwell/Skylake. http://stackoverflow.com/questions/15655835/flops-per-cycle-for-sandy-bridge-and-haswell-sse2-avx-avx2. Don't expect most real workloads to achieve this, but the hardware really can do it in practice with a bit of room left over for loop overhead and/or loads/stores. Hyperthreading can help come closer to saturating the HW when latency or branching is part of the bottleneck in real code. – Peter Cordes Sep 04 '16 at 13:08

0 Answers0