2

I observed Skylake SP doing writebacks of clean cache lines on real hardware.

An answer from @Leeor to this post Where data goes after Eviction from cache set in case of Intel Core i3/i7 states that

as of Skylake, some of the CPUs (server segments) no longer have an inclusive L3, but rather a non-inclusive (to support an increased L2). This means that clean lines are also likely to get written back when aging out of the L2, since the L3 does not normally hold copies of them.

I don't understand why the non-inclusivity of the L3 makes L2 clean cache lines being written back, can someone explain it to me?

Edit:

I finally found a way to measure the number of those clean writebacks. Out of 3 billion reads, only 20 resulted in a clean writeback to DRAM with perf counters.

 Performance counter stats for 'system wide':                                    
                                                                                 
     3,697,263,307      uncore_imc_1/event=0x4,umask=0x3/   /* cas_count_read */               
                20      uncore_imc_1/event=0xb8,umask=0x11/ /* wr_cas_rank0 BG0 */
                                                                                 
    1826.846941108 seconds time elapsed

Another thing is that I only observed those clean writebacks on a dual-socket platform.

alexghiti
  • 91
  • 4
  • I think that's a mistake, I think Leeor meant to say "likely to get *lost*", i.e. no longer be present in any level of cache. AFAIK, you're correct that clean lines never need to be written back. – Peter Cordes Nov 10 '20 at 07:32
  • Of course, NINE L3 doesn't mean exclusive; it's very possible for L3 to still hold copies of in L2, e.g. if a core was looping repeatedly over a 2MiB array you'd expect most of it to stay hot in L3. You'd mostly get mostly L2 misses because each core's L2 is only 1MiB. – Peter Cordes Nov 10 '20 at 07:34
  • Or perhaps Leeor is suggesting that L3 could act as a victim cache for L2, and allocate space for clean lines that L2 is evicting. (IIRC Skylake-SP doesn't do this either.) I see that I commented on that answer a couple years ago to ask about that "write-back clean lines" phrasing when it was added. – Peter Cordes Nov 10 '20 at 07:39
  • My first question was correct: I *really* observe writebacks of clean cache lines on Skylake-SP, meaning @Leeor could be right with his statement. But is it the non-inclusivity property that gives rise to those clean writebacks or is it something else ? – alexghiti Nov 10 '20 at 08:10
  • Observe how? Can you include some perf-counter results in your question? If this isn't just hand-waving, we need a precise definition of what you / we mean by "write-back". Do you mean to DRAM, or to L3, or what? (And measured how?) Anyway, I updated your question based on your last comment; please [edit] if I didn't quite capture what you meant. – Peter Cordes Nov 10 '20 at 08:24
  • 1
    We develop in-DRAM processors, that's how I observed it. By write-back, I mean that a clean cache line is written to *DRAM*. – alexghiti Nov 10 '20 at 08:48
  • Interesting! You should [edit] that into your question so it's clear to future readers. – Peter Cordes Nov 10 '20 at 08:51
  • You're certain it was not in MESI "Modified" state internally? If you just compared the data, you'd potentially see write-backs of the same data after something like `add [rsp], eax` where EAX happened to be zero. Or `lock cmpxchg [rdi], eax` could also dirty the line without changing the data bytes. Or any ABA sequence of modifications could of course leave a cache line dirty but with the same contents it had before. – Peter Cordes Nov 10 '20 at 08:55
  • I'm certain the state of those cache lines was clean since we only read those lines. – alexghiti Nov 10 '20 at 09:25
  • BTW, Leeor replied to my comment on the older answer: SKX L3 may sometimes act as a victim cache for evictions from L2; that's what he was talking about. But that wouldn't explain write-back *to DRAM*. Interesting that it only happens on a dual-socket system. Perhaps marking a line dirty in some weird corner case simplified some part of the design. – Peter Cordes Nov 19 '20 at 15:04

0 Answers0