How do you get the maximal number of float operations (in flops) from a GPU ?
For instance, on a GK20A GPU (embedded in Tegra K1), which can go up to 852 Mhz and has 192 cuda cores (each of them can do only one basic fp operation per cycle - if I read the specs correctly) and can go up to 852 Mhz, my first guess was basically: 852 * 192 = 163 GFLOPS.
However, Nvidia boasts at least 380 GFLOPS for the Tegra K1. What am I missing ?