I observed Skylake SP doing writebacks of clean cache lines on real hardware.
An answer from @Leeor to this post Where data goes after Eviction from cache set in case of Intel Core i3/i7 states that
as of Skylake, some of the CPUs (server segments) no longer have an inclusive L3, but rather a non-inclusive (to support an increased L2). This means that clean lines are also likely to get written back when aging out of the L2, since the L3 does not normally hold copies of them.
I don't understand why the non-inclusivity of the L3 makes L2 clean cache lines being written back, can someone explain it to me?
Edit:
I finally found a way to measure the number of those clean writebacks. Out of 3 billion reads, only 20 resulted in a clean writeback to DRAM with perf counters.
Performance counter stats for 'system wide':
3,697,263,307 uncore_imc_1/event=0x4,umask=0x3/ /* cas_count_read */
20 uncore_imc_1/event=0xb8,umask=0x11/ /* wr_cas_rank0 BG0 */
1826.846941108 seconds time elapsed
Another thing is that I only observed those clean writebacks on a dual-socket platform.