Can an inner level of cache be write back inside an inclusive outer-level cache?

Question

I have asked a similar question: Can a lower level cache have higher associativity and still hold inclusion?

Suppose we have 2-level of cache. (L1 being nearest to CPU (inner / lower-level) and L2 being outside that, nearest to main memory) can L1 cache be write back?

My attempt)
I think we must have only write through cache and we cannot have write back cache in L1. If a block is replaced in the L1 cache then it has to be written back to L2 and also to main memory in order to hold inclusion. Hence it has to be write through and not write back.

All these doubts arise from the below exam question. :P

Question) For inclusion to hold between two cache levels L1 and L2 in a multi-level cache hierarchy which of the following are necessary?

I) L1 must be write-through cache
II) L2 must be a write-through cache
III) The associativity of L2 must be greater than that of L1
IV) The L2 cache must be at least as large as the L1 cache

A) IV only
B) I and IV only
C) I, II and IV only
D) I, II, III and IV

As per my understanding, the answer needs to be Option (B)

Peter Cordes · Accepted Answer · 2020-01-22T09:21:14.270

7

Real life counterexample: Intel i7 series (since Nehalem) have a large shared (between cores) L3 that's inclusive. And all levels are write-back (including the per-core private L2 and L1d) to reduce bandwidth requirements for outer caches.

Inclusive just means that the outer cache tags have a state other than Invalid for every line in a valid state in any inner cache. Not necessarily that the data is also kept in sync. https://en.wikipedia.org/wiki/Cache_inclusion_policy calls that "value inclusion", and yes it does require a write-through (or read-only) inner cache. That's Option B, and is even stronger than just "inclusive".

My understanding of regular inclusion, specifically in Intel i7, is that data can be stale but tags are always inclusive. Moreover, since this is a multi-core CPU, L3 tags tell you which core's private L2/L1d cache owns a line in Exclusive or Modified state, if any. So you know which one to talk to if another core wants to read or write the line. i.e. it works as a snoop filter for those multi-core CPUs.

And conversely, if there are no tag matches in the inclusive L3 cache, the line is definitely not present anywhere on chip. (So an invalidate message doesn't need to be passed on to every core.) See also Which cache mapping technique is used in intel core i7 processor? for more details.

To write a line, the inner cache has to fetch / RFO it through the outer cache so it has a chance to maintain inclusion that way as it handles the RFO (read for ownership) from the L1d/L2 write miss (not in Exclusive or Modified state).

Apparently this is not called "tag-inclusive"; that term may have some other technical meaning. I think I saw it used and made a wrong(?) assumption about what it meant. What is tag-only forced cache inclusion called? suggests "tag-inclusive" doesn't mean tags but no data either.

Having a line in Modified state in the inner cache (L1) means an inclusive outer cache will have a tag match for that line, even if the actual data in the outer cache is stale. (I'm not sure what state caches typically use for this case; according to @Hadi in comments it's not Invalid. I assume it's not Shared either because it needs to avoid using this stale data to satisfy read requests from other cores.)

When the data does eventually write back from L1, it can be in Modified state only in the outer cache, evicted from L1.

edited Jan 22 '20 at 09:21

answered Dec 23 '19 at 05:37

Peter Cordes

328,167
45
605
847

1

To add to Peter's answer, with multi-level inclusive caches in multi-cores the coherence state per cache line (this is indeed kept with tag as Peter alludes to) is augmented to keep L2 aware of the correct location of data (add a state called shared-to-L1 etc.). – instinct71 Dec 31 '19 at 22:32
If a line is in the Modified state in the L1, it would still be in a valid state in the inclusive L2 because there is space allocated for it in the L2. In this situation, the line in L2 would be *stale*. Invalid means the cache entry is empty and can be used to place a new line. The L3 in Nehalem is is really fully inclusive, not just tag-inclusive. An example of a tag-inclusive cache is the L3 in Skylake-SP. – Hadi Brais Jan 22 '20 at 06:07
@HadiBrais: I'm not familiar with the formal terminology for this. Clearly L2 needs a way to track the fact that it's valid-but-stale. I was assuming we'd group this in with Invalid because it can't satisfy a read request, but I can see how the other way makes sense: that state requires evicting from inner caches before it can be replaced. So what exactly do we call that state? Valid-but-stale? – Peter Cordes Jan 22 '20 at 06:12
@HadiBrais: [Multi-level cache for which inclusion holds](//cs.stackexchange.com/q/14174) says that an L1 cache must be write-through for L2 to be inclusive. If not, L1 can hold data that's not in L2. That makes sense to me. Are there different conventions / definitions that people use? You say Nehalem L3 is fully inclusive not just tag-inclusive. Also, perhaps you can answer [What is tag-only forced cache inclusion called?](//cs.stackexchange.com/q/108790) – Peter Cordes Jan 22 '20 at 06:18
@HadiBrais Also, SKX L3 is tag-inclusive? What exactly does that mean, then? Does it still have the property that the private L2/L1 caches can't hold a set of lines that all alias the same set in L3, if there are more lines than L3 associativity? With many more cores than L3 associativity, that could be an occasional problem. I thought we were assuming that SKX had some kind of separate snoop filter, separate from L3 tags. Otherwise how would it be different from Nehalem? – Peter Cordes Jan 22 '20 at 06:21
Regarding your first comment, whatever the coherence protocol is, say MSI, in an inclusive cache, any of M or S could be stale. There is no explicit state for saying "stale" and that's not really needed. It's more like encoded in the implementation of the protocol. Regarding your second comment, write-through is not required for inclusion. Inclusion just means that if a line is present in any state (other than Invalid) in a cache, it has to be present in some state (other than Invalid) in an inclusive cache. That's the only correct definition of inclusivity. – Hadi Brais Jan 22 '20 at 06:44
Wikipedia has good brief [article](https://en.wikipedia.org/wiki/Cache_inclusion_policy) on this. Right, SKX has a separate snoop filter vs. Nehalem. The key word here is "separate." The snoop filter tracks lines in the whole cache hierarchy on the socket, even if a line is not present in the L3 (i.e., invalid). If the state of a line is evicted from the snoop filer, it's evicted from all the caches, similar to what happens when a line is evicted from an inclusive L3. That what makes the L3 in SKX tag-inclusive. So the cache is described to have an inclusive directory – Hadi Brais Jan 22 '20 at 06:44
@HadiBrais: If the snoop filter is *separate* from both L3 and private caches, how is L3 itself tag-inclusive? Wouldn't we just say that the snoop filter is tag-inclusive of all caches on the whole package, including L3 and the private caches? (But not data inclusive because it doesn't have data at all). I didn't have much luck googling for `"tag-inclusive" cache`; is that the standard technical meaning for it; an outer shared cache that's subject to evictions based on a more-associative set of tags? I thought snoop filters were sometimes probabilistic (e.g. only tell you def. not present) – Peter Cordes Jan 22 '20 at 07:36
@HadiBrais: For full inclusion, I guess it can hold the line in Exclusive or Modified state when any inner cache has exclusive ownership of a line, to stop it from answering load/share requests from other cores without checking with the owner core first. But wouldn't you want to distinguish that case from the case where a line is dirty (not written to DRAM yet) but not owned by any core? (e.g. write-back to L3 happened because of conflict misses, and *then* a share request comes in). MESI usually only talks about sibling caches, not inner/outer, but can you have Shared but dirty? – Peter Cordes Jan 22 '20 at 07:41
1

Ah, now I understand what you meant by tag-inclusive. I thought you're referring to the separate snoop filter in your answer. I have not seen this term before actually; it's just called an inclusive cache. The term you're looking for is "inclusive directory." See the Intel paper titled "NCID: A non-inclusive cache, inclusive directory architecture for flexible and efficient cache hierarchies." Regarding your last comment, I think you may find my [blog post](https://hadibrais.wordpress.com/2019/04/25/considering-snoops-when-counting-cache-hit-and-miss-events-on-client-processors/) helpful. – Hadi Brais Jan 22 '20 at 08:21

Ramdas M · Answer 2 · 2020-01-22T04:55:12.700

1

The answer to your question will be 1V) L2 only needs to be bigger. i.e option A

Inclusive only means that line in L1 need to be present in L2. The line could be modified further in L1 and the state in L1 will reflect same. When some other core looks up L2, it can Snoop state of line in L1 and force a WB if needes.

edited Jan 22 '20 at 04:55

answered Jan 03 '20 at 16:45

Ramdas M

86
4

You mean to say only Statement (IV) is true? So Option A is correct. :) – rohith Jan 03 '20 at 16:56
None of the four properties are really necessary for inclusion to hold. However, without property IV, making L2 inclusive of L1 wouldn't make sense because the L1 can never be fully utilized. I think the intended meaning of the question is "For inclusion to be useful..." But then other properties would be required: (1) there are other coherent agents in the system that would be benefit from having an inclusive L2, and (2) the L2 associativity needs to be at least as big as the L1. In summary, the question is badly written. Anyway, here is an upvote. – Hadi Brais Jan 22 '20 at 06:01

Can an inner level of cache be write back inside an inclusive outer-level cache?

2 Answers2

Linked