Branch prediction & speculative fetch mitigation

Question

Why isn’t virtual address (VA) separation enough to mitigate the various spectre & meltdown flaws? I mean the generic ones, not including the one that attacks the intel p-cache == v-cache hack; that was such an obviously bad idea, I can’t find any sympathy.

As a base line:

My kernel address space (AS) only shares one text and data page with the user AS. Those pages contain just enough code and data to save and store registers; load a new memory context, and jump to the appropriate place. Thus, there are no interesting addresses to uncover here.
No process ASs from exec have any VAs in common. That is, every VA allocation is taken from a common pool, so that even shared objects like libc are at a different address in every process. Most unix-derived folks would find this odd, but it is certainly feasible; I did it once by mistake^H*10/for testing.
Fork()’d processes images are sandboxed if they are in separate access control domains, to prevent cross leakage. Sandboxing can involve context switch cache eviction, cpusets that exclude hyper-threads, all the way up to a non-interference kernel.

I understand that [1] is the basic mitigation for meltdown-related problems; and [2] is a broadening of [1] so it applies to spectre. [3] would cause performance problems, but again, limited to just those cases.

What are you talking about with *attacks the intel p-cache == v-cache hack*? I understand exactly what Spectre and Meltdown are and how they work, but that doesn't sound anything like either of them. It sounds like you're talking about a VIPT L1d cache that avoids aliasing problems by being associative enough that the index bits all come from the offset within a page (and thus translate for free, so the cache behaves like a PIPT but can still do the TLB translation in parallel with fetching data+tags from the indexed set). That's not the cause of Meltdown. — Peter Cordes, Nov 21 '18 at 00:07

score 3 · Accepted Answer · answered Nov 21 '18 at 00:45

Meltdown attack depend on (speculatively) accessing the target virtual address directly (from within the attacking process)¹.

But Spectre is not. You prime the branch predictor so that the code under attack speculatively accesses its own virtual address space, which it has permission to do. Branch-predictor aliasing means you can usually / sometimes prime the prediction for a branch at a virtual address you can't / don't have mapped. (e.g. in the kernel.)

The usual side-channel, a cache-read attack, is based on evicting the cache for an array in your own address space. But other side-channels are possible to get the Spectre data from the target back to the attacker, like priming the cache and then looking for which entry was evicted by a conflict-miss for an address which aliases some memory in the process under attack. (Harder because L3 cache in modern x86 CPUs uses a complex indexing function, unlike simpler caches which use a simple range of bits as the index. But possibly you could use L2 or L1d misses. L2 miss / L3 hit should still be measurably longer than an L2 hit.)

Or with SMT (e.g. Hyperthreading), an ALU timing attack where the Spectre gadget creates data-dependent ALU port pressure. In this case the only relevant memory access is the data under attack (which is allowed by the hardware, only mis-speculation of the branch causes a rollback, not a load fault).

When attacking the kernel, it will have the physical memory pages of the attacking process mapped somewhere. (Most kernels map all of physical memory to a contiguous range of virtual addresses, allowing easy access to any physical address.) Caching is based on physical addresses, not virtual.

A Spectre gadget that makes a cache line hot via a different mapping for the same page still works.

In the context of a system call, the kernel usually keeps user-space memory mapped to the same virtual addresses it was using inside the process, so system calls like read and write can copy between user-space and the pagecache. And many system calls pass user-space pointers to filenames. So when attacking the kernel, a Spectre gadget can directly use user-space addresses in the attacking process.

The Spectre gadget itself could maybe even be in user-space memory, although with separate page tables to work around Meltdown, you might mitigate that by setting the kernel page tables to have user-space VAs mapped without exec permission.

Footnote 1: Meltdown is a bypass for the U/S bit in the page tables, allowing user-space to potentially read any memory the kernel leaves mapped. And yes, [1] is a sufficient workaround. See http://blog.stuffedcow.net/2018/05/meltdown-microarchitecture/.

Modern kernels, at least the ones I’ve worked on, do not map all of physical memory; that is an 1970s thing. Modern kernels don’t even have the means to do so. The branch predictors use hashing? I’ve only seen truly inside one architecture, and it certainly didn’t; so to train the branch predictor, you had to train it with exact VAs. Agreed SMT is mainly garbage, but I guess that was known 15 years ago. — mevets, Nov 21 '18 at 00:54
@mevets: Linux on x86-64 direct-maps all of physical memory, up to 64TB anyway. Note the `direct mapping of all physical memory (page_offset_base)` entry in https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt. With 56-bit 5-level page tables, the direct-map size goes up to 32 PB. — Peter Cordes, Nov 21 '18 at 00:58
Thank you, btw; I was thinking it was a combined laziness between the kernel folks and cpu folks; but that BP-shambles really puts the whole train wreck at the cpu peoples feet. — mevets, Nov 21 '18 at 01:00
Linux is not a modern os, but any stretch of anybodies imagination. — mevets, Nov 21 '18 at 01:01
@mevets: TAGE branch predictors do have "tagged" in the name, but my understanding is that they basically allow aliasing. The most common branch dominates the prediction for that combination of branch history and address, and doesn't have that valuable state wiped out by one rare branch that aliases. Paul Clayton comments that TAGE can/does use *partial* tagging: [Why did Intel change the static branch prediction mechanism over these years?](https://stackoverflow.com/posts/comments/90619942). See also Bee's comments [here](https://stackoverflow.com/posts/comments/81079172). — Peter Cordes, Nov 21 '18 at 01:03
@mevets: Bee's answer on [Do function pointers force an instruction pipeline to clear?](https://stackoverflow.com/a/50557160) has a link to a paper about "an indirect variant of TAGE" which it's widely believed Intel is using. — Peter Cordes, Nov 21 '18 at 01:04
@mevets: And BTW, kernel direct-mapping is probably not relevant if you're attacking via a system call. Like I said, the kernel will have user-space still mapped to user-space VAs. So you can either use that, or the Spectre gadget you find may only index into a kernel array and you'll have to figure out how that aliases your array. But if it indexes relative to a register, you have a good chance of indexing into a VA you know about. — Peter Cordes, Nov 21 '18 at 01:29
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/183998/discussion-between-mevets-and-peter-cordes). — mevets, Nov 21 '18 at 02:28

Branch prediction & speculative fetch mitigation

1 Answers1