2

For modern last level caches, they are divided according to slices. But I read some introductions about it, and I still haven't been able to figure out how it is divided according to addresses.

enter image description here

This is an introduction to slices in a paper. The bits other than the line offset are used to hash to get the slice id. Of course, LLC is usually indexed by physical address. The parameters of my server cache are as follows. It has 24 physical cores, so it has 24 slices, and each slice is close to a core.

LEVEL1_ICACHE_SIZE                 32768
LEVEL1_ICACHE_ASSOC                8
LEVEL1_ICACHE_LINESIZE             64
LEVEL1_DCACHE_SIZE                 32768
LEVEL1_DCACHE_ASSOC                8
LEVEL1_DCACHE_LINESIZE             64
LEVEL2_CACHE_SIZE                  262144
LEVEL2_CACHE_ASSOC                 8
LEVEL2_CACHE_LINESIZE              64
LEVEL3_CACHE_SIZE                  31457280
LEVEL3_CACHE_ASSOC                 20
LEVEL3_CACHE_LINESIZE              64
LEVEL4_CACHE_SIZE                  0
LEVEL4_CACHE_ASSOC                 0
LEVEL4_CACHE_LINESIZE              0

It has two sockets, each socket has 12 physical cores.

 NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46
 NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47

According to the above parameters, my LLC size is 31457280Byte. Each cache line is 64Byte. It is a 20-way set-associated structure. So there are 31457280/64/20=24576 cache sets. Each SOCKET has 12 physical cores, and they share an LLC. Therefore, each slice has a total of 24576/12=2048 cache sets. Which of my following understandings is correct? I prefer the first one to be correct.

  1. The set index on each slice is independently numbered. Therefore, bits 6-16 of the physical address are used to index the cache set. Then use all the bits except the line offset to find the slice id through the hash.

  2. The set index on all slices is numbered uniformly. But 24576 requires 14 bits for indexing (this does not seem to correspond exactly).

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Yujie
  • 395
  • 2
  • 12
  • Neither; the index function is a hash function of more bits, probably all the bits above the offset-within-line bits. I think [According to Intel my cache should be 24-way associative though its 12-way, how is that?](https://stackoverflow.com/q/37162132) covers this enough to be a duplicate; especially the links in comments to further research (https://github.com/alex10791/cssarev) about the hash function. (But I'm partly skimming here, correct me if I missed something that makes this a non-dup). – Peter Cordes May 10 '21 at 01:46
  • Before this question, I actually looked at the ones you mentioned. But I'm still not sure. Do you mean that both index and slice Id are obtained through hash? But look at the picture above. This picture comes from an top conference (Last-Level Cache Side-Channel Attacks are Practical). According to this picture, you can see that his set index is based on the middle bits, and the slice Id is obtained from the hash. Is this incorrect? – Yujie May 10 '21 at 02:09
  • Oh, yeah I see, that's different from how I previously *thought* it worked, but maybe I was wrong. The diagram is showing that within one slice, indexing is "normal", taking a contiguous range of bits above the cache-line offset. That actually makes a lot of sense as a design, with mapping to a slice being separate from indexing instead of *part of* indexing like I was previously imagining. So the cache can still hold more than 12 lines at 2M offsets from each other, for example, as long as they don't all map to one slice. That would be your option 1. – Peter Cordes May 10 '21 at 02:31

0 Answers0