What is the microarchitectural root cause of ZombieLoad?

Question

My interpretation is that, on a TLB miss, the PMH walks the page table and performs stuffed loads into the load buffer; if it encounters accessed or dirty bits that need to be set it communicates an exception code which will mark the load for retiring (assumedly it also places the virtual address whose load requires assistance somewhere that is accessible to the MSROM routine).

When it retires is when the exception is triggered which causes the pipeline to be flushed and a specific MSROM special uop to manifest itself at the allocate stage which will reperform the whole walk (no idea why the PMH can't perform stuffed writes itself but this is the general belief as to what happens). It does seem odd because it means that there would have to be a uop that indicates the store is to a physical address and there wouldn't have to be such a uop if the PMH performed stuffed stores. The special MSROM uop issue would have to jump to the page fault exception routine if it encounters an invalid or protected bit. If no dirty / accessed bits need to be set then it will be the PMH that communicates the page fault exception code.

The paper suggests that the load just continues and the L1d cache controller just returns—instead of a dummy value or 0 with the exception code of the cancelled load—the contents of a line-fill buffer which might still contain contents populated by the other logical core (which can then be used to transiently modify the cache for cache timing attacks).

Is this just a silly mistake on Intel's part; an unprecedented side-effect?

There seems to be 2 questions: (1) Why the accessed and dirty bits work like that? and (2) What is the microarchitectural root cause of ZombieLoad? I suggest changing the post to keep the first question and remove the second question. Regarding the second question, refer to Section 3.2 of the ZombieLoad paper and https://stackoverflow.com/questions/56187269/about-the-ridl-vulnerabilities-and-the-replaying-of-loads. I can post an answer to your first question. — Hadi Brais, May 21 '19 at 17:21
@HadiBrais Okay, I'll do a bit more research and see if I manage to clear either one up, thanks for more direction. But yes I am essentially implying that I don't know why the PMH behaves that way, I.e. requiring an assist, whilst asking the question why the l1d cache presents a line fill buffer contents rather than a zeroed value for instance, when its TLB/PMH unit requires a micro assist. — Lewis Kelsey, May 23 '19 at 20:16

What is the microarchitectural root cause of ZombieLoad?

0 Answers0