According to my understanding, in a 64 bit system, the memory bus is of 64 bits. So the transfer between Memory controller (MC) and DRAM is of 64 bits. Since cache block size is 64 bytes, Does there is one transfer of cache block sized data from MC to Cache? ( ie data path width is of Cache Block Size) or Does it do 8 times transfer (each of 64 bits, considering data path width of 64 bits, same as memory bus?)
-
3DDR width is indeed 64 bits and can do a 64 bytes transfer with a burst transaction. The link between the cache and the MC (which is also inside the CPU) is wider (32/64B). – Margaret Bloom Jan 29 '20 at 08:45
1 Answers
Usually somewhere in between. e.g. the uncore ring bus in mainstream Intel CPUs (before Skylake Xeon) is 32 bytes wide in each direction, so a burst transfer of a cache line from one interconnect stop to the next takes 2 cycles. Intel could have replaced the bidirectional 32-byte data ring with a unidirectional 64-byte ring (with potentially a little less area), but that would significantly increase the worst-case latency of all uncore transactions, especially L3 hits. This 64-byte design would also not be scalable with the number of cores as the worst-case latency gets higher. (Why isn't there a data bus which is as wide as the cache line size?). Intel Knights Ferry uses a ring that is 64-byte wide in each direction. This massive interconnect is made possible by having smaller cores, no dedicated L3 cache, and a much larger die compared to processors of the same gen (or even newer). The interconnect width in the more recent Intel processors is probably either 32 bytes or 64 bytes in each direction. (See: What is the data width of the mesh in SKX?)
The L2-L3 bus (i.e., the path that connects a private L2 cache with its associated L3 slice) has a lower bandwdith than that of the interconnect, so the L2-L3 bus could have a smaller width (16 bytes is plausible). The path from L2 -> L1 data cache is 64 bytes wide in Skylake. Some non-Intel sources make stronger claims about the widths of these paths, but they may or may not be accurate.
And BTW, the memory bus has been 64 bits since SDRAM / DDR1, before x86-64 was a thing. The memory bus itself works in burst transfers of 64 bytes.
And unrelated to memory bus width, x86 since 32-bit P5 Pentium guaranteed that 8-byte (64-bit) aligned accesses are atomic (only possible using x87 or MMX on that uarch). That fact comes from load / store execution port width, and cache->cache transfer protocols.

- 22,259
- 3
- 54
- 95

- 328,167
- 45
- 605
- 847
-
Do we know for sure that the mesh in Skylake Xeon (and/or Xeon Phi) supports 64 bytes per cycle? Is there an Intel source for this? [WikiChip](https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(server)) says that the L2-L3 bus is 64B wide. I'm not sure where this info came from. That also doesn't necessarily mean that the data network of the mesh is 64B wide. – Hadi Brais Jan 29 '20 at 17:03
-
According to Slide 11 of [these](https://web.archive.org/web/20120402211714/http://www.many-core.group.cam.ac.uk/ukgpucc2/talks/Elgar.pdf) slides, KNF had a 128-byte bidirectional ring bus, 64 bytes in each direction. – Hadi Brais Jan 29 '20 at 20:11
-
@HadiBrais: I didn't intend to make any claims about the width of the mesh in SKX or KNL / KNM. I excluded them from my first paragraph because they don't have a ring bus, not because I know anything about width. Thanks for digging up links to unconfirmed info. – Peter Cordes Jan 29 '20 at 23:11