According to “Intel 64 and IA-32 architectures optimization reference manual,” April 2012 page 2-23
The physical addresses of data kept in the LLC data arrays are distributed among the cache slices by a hash function, such that addresses are uniformly distributed. The data array in a cache block may have 4/8/12/16 ways corresponding to 0.5M/1M/1.5M/2M block size. However, due to the address distribution among the cache blocks from the software point of view, this does not appear as a normal N-way cache.
My computer is a 2-core Sandy Bridge with a 3 MB, 12-way set associative LLC cache. That does not seem to be coherent with Intels documentation though. According to the data it seems that I should have 24-ways. I can imagine there is something going on with the number of cores/cache-slices but I can't quite figure it out. If I have 2 cores and hence 2 cache slices 1.5 MB per slice, I would have 12 ways per cache slice according to Intel and that does not seem consistent with my CPU specs. Can someone clarify this to me?
If I wanted to evict an entire cache line would I need to access the cache in strides of 128 KB or 256 KB? In fact this is what I am trying to achieve.
Any suggested readings are very welcome.