GFLOPS Equation
For a system with one processor (and one socket), here's the equation:
GFLOPS = number of cores × core frequency (GHz) × number of operations per clock cycle
For the equation, you use physical cores, not logical (threads). Also, the number of operations a processor core can complete per second varies depending on the architecture of the processor in question, and whether you're after single or double precision figures. I'll explain this a little more below.
SSE, SSE2 and 3DNow! Instructions (ISEs)
Calculating the FLOPs performance for older processor architectures is a little more involved than the newer chips we're used to. If you don't plan on calculating the FLOPs/cycle of any chip older than a K8 or Core2, then you can gloss over this section. One thing to take away from this, though, is that instruction set extensions like these can affect the number of FLOPs/cycle a chip can run. For example, a Pentium 4 with no instruction set extensions can perform, at best, 1 FLOP/cycle in single precision. With SSE being utilized, however, it can perform 4 FLOPs/cycle in single precision. Additionally, double precision for a Pentium 4 doubles from 1 FLOP/cycle with no extensions, to 2 FLOPs/cycle using SSE2.
If SSE instructions are supported, 4 FLOPs can be executed with every clock cycle. This applies to both Intel and AMD processors that support SSE instructions.
SSE2 instructions allow for 2 FLOPs with every cycle for double precision arithmetic. SSE2 does not affect single precision. Again this applies to both vendors although be warned. A limited model range of AMD's processors supported SSE2 during the early adoption phase, and that's where the last set of instructions come in...
3DNow! instructions are only used by AMD parts. In the confines of FLOPs/cycle, the functionality is identical to SSE instructions. Therefore, AMD chips that support 3DNow! but lack SSE support, can still carry out 4 FLOPs per clock cycle for single precision. 3DNow! does not affect double precision. There are also AMD models that support both 3DNow! and SSE instructions. Why, you ask? The functionality of these instructions go beyond FLOP improvements, and one offers features that the other doesn't and vice versa. That is beyond the scope of what you're asking, but I felt it necessary to clarify to avoid confusion.
Both Intel and AMD like to calculate FLOPs/cycle with all instruction set extensions enabled, so I'd advise you to do the same.
With newer architectures, this need not be a concern. All Intel families from the Pentium III support SSE, and from the Pentium 4 support SSE2. All AMD families from the K6-2 support 3DNow!, and from the Athlon XP/MP, Duron and Sempron support SSE. SSE2 support in AMD chips didn't arrive until the Athlon 64 and its siblings, Sempron and Turion 64.
FLOPs/Cycle per Architecture
(Note the following list contains architecture names, not processor family names.)
- P5 & P6 (no ISEs) + Pentium Pro & Pentium II = 1 (single); 1 (double)
- P6 (Pentium III only) = 4 (single); 1 (double)
- NetBurst = 4 (single); 2 (double)
- Pentium M & Enhanced Pentium M = 4 (single); 2 (double)
- Core, Penryn, Nehalem & Westmere = 8 (single); 4 (double)
- Sandy Bridge & Ivy Bridge = 16 (single); 8 (double)
- Haswell, Broadwell, Skylake (LGA1151 & Mobile), Kaby Lake & Coffee Lake = 32 (single); 16 (double)
- Skylake ("Skylake-X" Core i7 & Core i9 [LGA2066]) = 128 (single); 64 (double)
- Skylake ("Skylake-SP" Xeon Bronze & Xeon Silver) = 64 (single); 32 (double)
- Skylake ("Skylake-SP" Xeon Gold & Xeon Platinum) = 128 (single); 64 (double)
- Bonnell, Saltwell, Silvermont & Airmont = 6 (single); 1.5 (double)
- MIC ("Knights Corner" Xeon Phi) = 32 (single); 16 (double)
- MIC ("Knights Landing" Xeon Phi) = 64 (single); 32 (double)
- K5 & K6 = 0.5 (single); 0.5 (double)
- K6-2 & K6-III = 4 (single); 0.5 (double)
- K7 & K8 = 4 (single); 2 (double)
- K10/Stars = 8 (single); 4 (double)
- Husky = 8 (single); 4 (double)
- [Note] Bulldozer, Piledriver, Steamroller & Excavator = 8 (single); 4 (double)
- Zen & Zen+ = 16 (single); 8 (double)
- Zen 2 & Zen 3 = 32 (single); 16 (double)
- Bobcat = 4 (single); 1.5 (double)
- Jaguar, Puma and Puma+ = 8 (single); 3 (double)
Note — Shared FPUs mean there's one FPU for every two cores. Despite what is spread online, AMD claims the Steamroller-based A10-7850K is capable of 856 SP GFLOPs; 737 of those are the Radeon R7 integrated graphics, leaving 119 for the CPU. To achieve 119 SP GFLOPs, requires 8 FLOPs per cycle. This should apply for all variants of Bulldozer as the FPU design has remained identical throughout.