3

I read an introduction to last level cache in a paper. In the introduction, it is said that the last level cache has many slices. Each slice is like a traditional set-associated cache. Find the corresponding cache set by set index and slice id (as shown in the figure below).

enter image description here

I want to check how many slices my server has. And the information of each slice (how many cache sets, how many cachelines in each cache set). The method I searched on google is to view the cache information under the folder /sys/devices/system/cpu/cpu0/cache/index3.

But after checking, I found that the introduction under this folder is no different from l1 and l2. The files under this folder have

coherency_line_size  level           physical_line_partition  shared_cpu_list  size  uevent
id                   number_of_sets  power                    shared_cpu_map   type  ways_of_associativity

How can I check how many slices there are on the server? Is the number_of_sets shown here a slice cache set?

I am using a server. The version is:Linux version 4.15.0-122-generic (buildd@lcy01-amd64-010) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)) #124~16.04.1-Ubuntu SMP.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Gerrie
  • 736
  • 3
  • 18
  • 1
    The info you found is specific to *intel*'s Sandybridge-family CPUs, which put a slice of L3 next to every core. But the whole thing acts as one large L3. No, each slice holds many sets (of 12 lines, for a 12-way associative cache: [Which cache mapping technique is used in intel core i7 processor?](https://stackoverflow.com/q/49092541) / [According to Intel my cache should be 24-way associative though its 12-way, how is that?](https://stackoverflow.com/q/37162132)). – Peter Cordes Dec 08 '20 at 12:59
  • 2
    See also https://www.uops.info/dissertation.pdf, page 86ff and https://github.com/andreas-abel/nanoBench/tree/master/tools/CacheAnalyzer – Andreas Abel Dec 08 '20 at 17:27
  • Thank you for your comments. But how can I tell if the cpu is an ivy bridge or a Sandybridge? If it is an ivy bridge, does the LLC still divide into slices? I used the tools recommended above, but the error of CPU 0 cannot read MSR 0x00000396 appeared – Gerrie Dec 09 '20 at 02:15
  • Ivybridge is a member of the Sandybridge *family*. So is Skylake, even Ice Lake. But earlier Intel CPUs like Core 2 and Nehalem were different. And AMD Zen CPUs are also very different: clusters of 4 or 8 cores each sharing an L3. – Peter Cordes Dec 10 '20 at 13:21

1 Answers1

2

Haswell and Broadwell

The following processor collections use the server uncore microarchitecture:

  • Core X
  • Xeon E5
  • Xeon E7

On these processors, the number of L3 cache slices can be obtained as follows:

Step 1: Run the following command on a Linux terminal:

setpci | grep Power

This shows you a list of PCI device functions of the PCU of each processor package in your system. An example output on dual Xeon E5 v4 processor system looks like this:

df:1e.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
df:1e.1 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
df:1e.2 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
df:1e.3 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
df:1e.4 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
df:1e.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
df:1e.2 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
ff:1e.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
ff:1e.1 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
ff:1e.2 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
ff:1e.3 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
ff:1e.4 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
ff:1e.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
ff:1e.2 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)

The device functions on bus df are on one processors and those on bus ff are on the other socket. The only piece of information needed is the bus number. Take bus df for example.

Step 2: Run the following command:

sudo setpci -s df:1e.3 98.l

An example out is 46000f2e. Bits 0-23 of this value represent a bit vector where a bit value of zero indicates an enabled L3 cache slice and a bit value of zero indicates a disabled cache slice. You can't actually disable any slices yourself; only the PCU can do that during package C-state transitions. During normal operation, all the available slices would be enabled. Therefore, the number of set bits is the number of slices. In this example, the bit vector 000f2e has 8 bits set, so the number of slices on the selected processor is 8.

Usually, all Intel processors in the same shared memory system are homogeneous, but you can repeat the above states for each processor if you want.

In general, there can be up to 24 slices, each up to 2.5 MiB in size.

Sandy Bridge and Ivy Bridge

The following processor collections use the server uncore microarchitecture:

  • Core X
  • Xeon E5
  • Xeon E7

Step 1: The same as before.

Step 2: Run the following command:

sudo setpci -s XX:0a.3 94.l

where XX is the bus number from Step 1. Bits 0-14 represent the cache slice bit vector. In general, there can be up to 15 slices, each up to 2.5 MiB in size.

I'm not sure whether this method works on the Sandy Bridge processors and Ivy Bridge's Core X series, but there is no harm in checking if it works.

Skylake, Cascade Lake, and Cooper Lake

The following processor collections use the server uncore microarchitecture:

  • Core X
  • Xeon SP
  • Xeon W
  • Xeon D

Step 1: The same as before.

Step 2: Run the following command (thanks to @JohnDMcCalpin):

sudo setpci -s XX:1e.3 9c.l

where XX is the bus number from Step 1. Bits 0-27 represent the cache slice bit vector. In general, there can be up to 28 slices, each 1.375 MiB in size.

All processor models with server uncore released by Intel have L3 caches consisting of 1.375 MiB slices. The number of slices is the total cache size divided by 1.375 MiB. I'm not aware of any exceptions.

Ice Lake

The following processor collections use the server uncore microarchitecture:

  • Xeon SP

It seems to me that the slice size is 1.5 MiB on these processors.

Nehalem and Westmere

The following processor collections use the server uncore microarchitecture with a distributed L3 cache:

  • Xeon 6500
  • Xeon 7500
  • Xeon E7

The slice size can be up to 3 MiB on these processors.

All other Intel processors with a server uncore design

They don't use a distributed cache architecture, so the concept of slice doesn't exist on these processors.

Hadi Brais
  • 22,259
  • 3
  • 54
  • 95
  • 2
    The CAPID6 register is also present in SKX/CLX processors, with a 28-bit bitmap of the enabled LLC slices. On my 2-socket systems, the registers can be read with "setpci -s 17:1e.3 0x9c.l; setpci -s 85:1e.3 0x9c.l" – John D McCalpin Dec 10 '20 at 17:40
  • @JohnDMcCalpin Thanks, good find. That register is not documented for SKX/CLX as far as I can tell. – Hadi Brais Dec 10 '20 at 18:02
  • 2
    CAPID6 is documented in section 1.7.1 of the Xeon Scalable Memory Family Uncore Performance Monitoring Guide (336274-001). – John D McCalpin Dec 11 '20 at 21:10
  • @JohnDMcCalpin I see. I thought it'd be documented in the Datasheet Volume 2 as usual, but I guess they decided to only put in the uncore guide. – Hadi Brais Dec 11 '20 at 21:16
  • 2
    When I use setpci | grep Power, the error occurs as : setpci: No operation specified. What should I do? Thanks! – Yujie May 09 '21 at 14:44
  • setpci | grep Power setpci: No operation specified? Did you mean lspci? – MappaM Jun 13 '23 at 13:47