Continuing my study of cache-as-ram mode for x64 cores: I did not see any reason given in either discussions or documentation whether both of a pair of logical i3 cores during hyperthreading had to be set for cache-as-ram execution, or neither. Is a mix possible, or should I divide this between actual i3 cores?
Asked
Active
Viewed 32 times
0

Peter Cordes
- 328,167
- 45
- 605
- 847

eternalsquire
- 111
- 2
-
What are you hoping to achieve? Do you want the logical cores to communicate with each other or do independent tasks? What task is one of them doing where no-fill mode might help? Also, logical cores of the same physical core share L1d and L2 caches, so evictions and fills from one core could evict the lines set up by the other. If you used separate physical cores, only L3 would be shared. – Peter Cordes Jul 22 '23 at 17:47
-
I want to run a debugging environment with one core and the other core being the device under test. But when you say that with the logical cores share the L1 and L2 caches, is it all or nothing? Can I have some of the cache go into no-fill mode and the rest of it be used conventionally by the other logical core? – eternalsquire Jul 22 '23 at 17:58
-
If it's supported on a per-logical-core basis at all, I'd guess that it works like one core's accesses never cause fills or evictions (because you put it in no-fill mode), and the other core's accesses will evict and fill on cache miss as normal. So by core, not by address-range. If you know the cache geometry (size and associativity, thus which address bits are used as the index in L1d vs. L2), you could maybe avoid accessing lines that index some sets from the fill-mode CPU, to avoid disturbing lines in those sets. (Except for lines that are known to be hot; those will hit). – Peter Cordes Jul 22 '23 at 18:09
-
Makes sense. So now I need to learn more about selecting lines versus addresses, because this seems undocumented in the Intel manuals. – eternalsquire Jul 22 '23 at 19:34
-
1Check the optimization manuals, they should mention the cache geometry. Or really check an intro to computer architecture textbook for info about how set-associative caches index sets. We know that Intel client CPUs have 32 KiB 8-way associative L3 caches, or 48 KiB 12-way, precisely because that makes the index bits come from the offset-within-page part of the address, giving VIPT performance with PIPT's lack of aliasing. (And L2 caches are typically 256 K 4-way, since Skylake. Before that 8-way). L3 caches use a hash function to index, but L1 and L2 just use a range of bits directly. – Peter Cordes Jul 22 '23 at 19:42
-
See [Minimum associativity for a PIPT L1 cache to also be VIPT, accessing a set without translating the index to physical](https://stackoverflow.com/q/59279049) . (And for the basics, https://en.wikipedia.org/wiki/Cache_placement_policies#Set-associative_cache ). For L1d cache, offset within page decides which set you index. On miss, pseudo-LRU determines which existing line (way) is evicted to make room to allocate the line you just accessed. [Which cache mapping technique is used in intel core i7 processor?](https://stackoverflow.com/q/49092541) – Peter Cordes Jul 22 '23 at 23:29
-
I read the optimization manuals. The geometry was indeed discussed at length but unless I am really missing something, nothing regarding the MTRR memory mapping. Will read your following comment. – eternalsquire Jul 23 '23 at 00:54
-
MTRRs are unrelated to this. No-fill isn't something you can set for an address-range, only per-core for *all* its accesses, unless I'm mistaken. – Peter Cordes Jul 23 '23 at 00:56
-
Oh. I guess that simplifies matters, somewhat. So logical core A must prefetch its data into the L2 cache, then switch to CAR mode. Then logical core B, when desired, can switch into CAR mode, read from the cache based on same hashed addressing tags into a register, leave CAR mode, and then act upon that register? – eternalsquire Jul 23 '23 at 01:06
-
Huh? Why would logical core B have to or want to switch to CAR mode? You just access the same physical address and your access will hit in those prefetched cache lines. (Or if you generate a miss at an address that aliases them, evict one of them.) – Peter Cordes Jul 23 '23 at 01:17
-
To access the cache that logical core A has nailed down, as a form of IPC – eternalsquire Jul 23 '23 at 02:33
-
Anyway, the documents discussed all point to a BIOS writers guide for the i3 that is closely held by Intel. So the only actual documentation regarding the workings of NEM are explained in coreboot source. I will ask them. – eternalsquire Jul 23 '23 at 02:35
-
Switching core A to no-fill mode doesn't stop core B from getting hits on those cache lines, regardless of it being in CAR mode or not. (Again, if it even works to have separate settings on different logical cores.) I guess one reason you might want to set no-fill mode on core B is to prevent hardware prefetch from accessing lines nearby and possibly causing evictions, if the lines you access from core B aren't in the middle of a range of cache lines known to be hot in cache. But like I keep saying, the cache lines aren't specially isolated, no-fill mode is literally just that, AFAIK. – Peter Cordes Jul 23 '23 at 06:59
-
@eternalsquire There's something about NEM [here](https://blog.csdn.net/robinsongsog/article/details/9964079). But basically its setup is similar to the no-fill mode one but has two phases. In the first phase, you set bit0 of MSR 0x2e0 and touch each line you want to cache. I don't know what's different between this setup phase and ordinary caching (if you find out, it'd be great if you could share). When done, you set bit1 of the same MSR to begin the run phase. In this phase the code is still cached in the L1I but the data lines are not evicted. – Margaret Bloom Jul 23 '23 at 13:05