0

I'm compiling data to compare CPU and GPU GFLOP performance, and I'm looking currently at dual socket CPUs (E5-26xx family), however after Broadwell comes Skylake architecture which has Bronze and Silver dual processor families, but they have half the cores and performance than the Broadwell ones. Am I missing something?

Moody
  • 1,297
  • 2
  • 12
  • 21
  • You are confounding the Microarchitecture name (Broadwell, Skylake, Coffeelake) with the marketing name (Xeon XXX). It's unclear what you are asking, considering that Intel broadly advertises the [Xeon Scalable](https://www.intel.com/content/www/us/en/products/processors/xeon/scalable/platinum-processors.html) processors. With 8+ Sockets (28c/56t per socket) and up to 12TiB (!) of memory, they seem pretty awesome. Nominal frequency and core number may be a bit lower than LC arch, but that's necessary to have a manageable TDP. – Margaret Bloom Feb 19 '18 at 21:54
  • @MargaretBloom: I'm pretty sure the question is asking why there aren't high-core-count Skylake-Xeon CPUs that only work in 2S system. The only "2P" rated Skylake Xeons are Bronze / Silver. – Peter Cordes Feb 20 '18 at 22:54
  • @PeterCordes Ah, OK. I was confused because even Gold/Platinum can be used in a 2S system. I guess Intel didn't want too many versions to maintain. – Margaret Bloom Feb 21 '18 at 08:05

2 Answers2

3

Interesting, it seems you're right that the only high-core-count Skylake-server chips are also capable of being used in 4-socket systems. (https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(server)#Brands)

You can put Gold / Platinum CPUs in dual-socket systems. I assume most of what you're paying for in the high-core-count CPUs is the cores / cache themselves, so it's not a waste to use them in a 2-socket system.

SKX uses UPI instead of QPI as the interconnect between sockets. A CPU with 2 UPI links can be used in a 4P system, forming a ring instead of an all-to-all with 3 links in each CPU. Or a 2P system can use all 3 UPI links between the two sockets for more bandwidth. (Wikichip has diagrams)

Bronze / Silver, and Gold 5xxx CPUs have 2 UPI links, while Gold 6xxx and Platinum CPUs have 3 UPI links. (wikipedia)

Inside each Skylake-SP CPU (on a single die) the interconnect between cores is a mesh, vs. a ring bus in Broadwell and earlier.


4P/8P Broadwell (and earlier) Xeons have a small (14kiB? I can't find a more detailed description right now) snoop filter cache (see John McCalpin's post in this thread, but 2P chips don't, and just broadcast snoop requests to the other socket as they load from local DRAM, when a load misses in L3. This "uses a modest fraction of the QPI bandwidth". (The exact snoop behaviour is configurable with different modes to optimize for low-latency local memory vs. less-bad latency for remote memory, and so on).

Thus there is a hardware (not just artificial marketing / market-segmentation) difference between 2P and 4P/8P chips with the same core count for Broadwell and earlier.


Skylake-SP always has a snoop filter. See the Directory-Based Coherency section in Intel's paper on Skylake-Xeon internals.

(IDK the details. Maybe the Bronze/Silver chips are weaker, but their marketing department decided it wasn't worth doing finer-grained market segmentation within the Gold chips.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
1

You do not miss anything in terms of Intel CPU generations code names, but your statement about "half the performance" is unclear. In particular, what exact SKUs do you compare? And, why did you choose to compare exactly that products of different generations? The official database is at http://ark.intel.com, there you can find models of the same market segments in different generations.

Grigory Rechistov
  • 2,104
  • 16
  • 25
  • Are opinions expressed there solely your own and do not express the views or opinions of your employer? – osgx Mar 09 '18 at 23:27