Computing capability of each core

Question

I am looking for a benchmark which measures the computing capability of each core in my system (a supercomputer). In the other word, I want to find the realistically achievable maximum floating-point operations per second in one processor. I found a benchmark named SPEC that might be useful for me but it's not for free. Could you suggest me any suitable benchmark for this aim?

Any help would be appreciated.

Realistically achievable GFLOP/s for what kind of workload? I think if you avoid memory bottlenecks, carefully tuned code can achieve the theoretical max of two 256b vector FMA instructions per clock (on each core), for Intel Haswell and later. A good matrix multiply implementation should come very close to this, IIRC. Different workloads will be vastly different. SPECfp is a bunch of specific FP workloads. IDK if any of them bottleneck on pure GFLOP/s throughput in a way that would let the compiler auto-vectorize to something that would saturate the FMA units, but I highly doubt it. — Peter Cordes, Sep 04 '16 at 12:44
@PeterCordes thanks for the comment. I would use the intel modules. But how could you get that amount? Is there any benchmark app that I can use? I want to get this amount without memory bandwidth impact — Matrix, Sep 04 '16 at 12:55
Just calculate the theoretical max from 8 single-precision floats/vector * 2 vector-FMAs/clock * 2 FLOPs/FMA * clock speed. e.g. 96GFLOP/s per physical core for a 3GHz Haswell/Broadwell/Skylake. http://stackoverflow.com/questions/15655835/flops-per-cycle-for-sandy-bridge-and-haswell-sse2-avx-avx2. Don't expect most real workloads to achieve this, but the hardware really can do it in practice with a bit of room left over for loop overhead and/or loads/stores. Hyperthreading can help come closer to saturating the HW when latency or branching is part of the bottleneck in real code. — Peter Cordes, Sep 04 '16 at 13:08

Computing capability of each core

0 Answers0