Haswell and Broadwell
The following processor collections use the server uncore microarchitecture:
On these processors, the number of L3 cache slices can be obtained as follows:
Step 1: Run the following command on a Linux terminal:
setpci | grep Power
This shows you a list of PCI device functions of the PCU of each processor package in your system. An example output on dual Xeon E5 v4 processor system looks like this:
df:1e.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
df:1e.1 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
df:1e.2 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
df:1e.3 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
df:1e.4 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
df:1e.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
df:1e.2 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
ff:1e.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
ff:1e.1 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
ff:1e.2 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
ff:1e.3 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
ff:1e.4 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
ff:1e.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
ff:1e.2 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 01)
The device functions on bus df
are on one processors and those on bus ff
are on the other socket. The only piece of information needed is the bus number. Take bus df
for example.
Step 2: Run the following command:
sudo setpci -s df:1e.3 98.l
An example out is 46000f2e
. Bits 0-23 of this value represent a bit vector where a bit value of zero indicates an enabled L3 cache slice and a bit value of zero indicates a disabled cache slice. You can't actually disable any slices yourself; only the PCU can do that during package C-state transitions. During normal operation, all the available slices would be enabled. Therefore, the number of set bits is the number of slices. In this example, the bit vector 000f2e
has 8 bits set, so the number of slices on the selected processor is 8.
Usually, all Intel processors in the same shared memory system are homogeneous, but you can repeat the above states for each processor if you want.
In general, there can be up to 24 slices, each up to 2.5 MiB in size.
Sandy Bridge and Ivy Bridge
The following processor collections use the server uncore microarchitecture:
Step 1: The same as before.
Step 2: Run the following command:
sudo setpci -s XX:0a.3 94.l
where XX is the bus number from Step 1. Bits 0-14 represent the cache slice bit vector. In general, there can be up to 15 slices, each up to 2.5 MiB in size.
I'm not sure whether this method works on the Sandy Bridge processors and Ivy Bridge's Core X series, but there is no harm in checking if it works.
Skylake, Cascade Lake, and Cooper Lake
The following processor collections use the server uncore microarchitecture:
- Core X
- Xeon SP
- Xeon W
- Xeon D
Step 1: The same as before.
Step 2: Run the following command (thanks to @JohnDMcCalpin):
sudo setpci -s XX:1e.3 9c.l
where XX is the bus number from Step 1. Bits 0-27 represent the cache slice bit vector. In general, there can be up to 28 slices, each 1.375 MiB in size.
All processor models with server uncore released by Intel have L3 caches consisting of 1.375 MiB slices. The number of slices is the total cache size divided by 1.375 MiB. I'm not aware of any exceptions.
Ice Lake
The following processor collections use the server uncore microarchitecture:
It seems to me that the slice size is 1.5 MiB on these processors.
Nehalem and Westmere
The following processor collections use the server uncore microarchitecture with a distributed L3 cache:
- Xeon 6500
- Xeon 7500
- Xeon E7
The slice size can be up to 3 MiB on these processors.
All other Intel processors with a server uncore design
They don't use a distributed cache architecture, so the concept of slice doesn't exist on these processors.