0

How do you get the maximal number of float operations (in flops) from a GPU ?

For instance, on a GK20A GPU (embedded in Tegra K1), which can go up to 852 Mhz and has 192 cuda cores (each of them can do only one basic fp operation per cycle - if I read the specs correctly) and can go up to 852 Mhz, my first guess was basically: 852 * 192 = 163 GFLOPS.

However, Nvidia boasts at least 380 GFLOPS for the Tegra K1. What am I missing ?

GaTTaCa
  • 459
  • 6
  • 18
  • 4
    Each CUDA core can execute one IEEE-754 (2008) single-precision FMA (fused multiply-add) instruction per cycle. Each FMA counts as *two* FP operations (one multiply plus one add). – njuffa Feb 17 '15 at 16:39
  • 1
    I can find no reference in which NVIDIA specifically claims 380 GFLOPS for the GPU portion of the Tegra K1. That number you quote from a third party may combine the GPU and CPU floating-point throughput. [NVIDIA's technical brief for Tegra K1](http://developer.download.nvidia.com/embedded/jetson/TK1/docs/Jetson_platform_brief_May2014.pdf) states: "Tegra K1 with the Kepler GPU architecture is a parallel processor capable of over 300 GFLOPS of 32-bit floating point computations." – njuffa Feb 17 '15 at 16:49
  • 2
    Since this question is a general hardware question, which is considered off-topic on SO, I am voting to close it. – njuffa Feb 17 '15 at 16:52
  • The nvidia doc table you linked specifically calls out `multiply-add` as one of the FP32 operations. You're also not reading the anandtech article correctly. There is no claim in there of 380 GFLOPS, by NVIDIA or anyone else. – Robert Crovella Feb 17 '15 at 16:55
  • 1
    Best I can determine, the Tegra K1 GPU delivers a theoretical throughput of 327.2 single-precision GFLOPS, and 13.6 double-precision GFLOPS. – njuffa Feb 17 '15 at 17:20
  • @njuffa: right, I remember reading the slides from nvidia's TK1 annoucement where they claimed TK1 to be *over* 300 GFLOPS, these 300 later turned in my head to be 380 after reading anandtech article. Apologies. OK for the 852 Mhz * 192 cuda cores * 2 = 327.2 GFLOPS. Thanks a lot. – GaTTaCa Feb 17 '15 at 17:23
  • Shouldn't this be considered a duplicate? http://stackoverflow.com/questions/11912703/how-to-evaluate-cuda-performance – Christian Sarofeen Feb 18 '15 at 02:09
  • @ChristianSarofeen: It does not look like a duplicate of the linked question to me. This question is not even about CUDA per se, it is seeking an explanation for theoretical performance data of a particular GPU based on its hardware specifications. This question is off-topic, the linked question is on-topic but overly broad (IMHO). – njuffa Feb 18 '15 at 04:36
  • @njuffa: OK for the CUDA tag and I admit this question is maybe too specific. However, your remark on FMA counting as two operations can bring some valuable info to compute theoritically the performance of a given GPU. – GaTTaCa Feb 18 '15 at 11:27

0 Answers0