3

Modern Intel and AMD chips have large store buffers to buffer stores before commit to the L1 cache. Conceptually, these entries hold the store data and store address.

For the address part, do these buffer entries hold virtual or physical addresses, or both?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
  • 2
    I think a store uop has to check for a legal address during execution; that means reading the TLB. It seems crazy to discard this and force the store buffer to redo virt->phys as it commits to L1d cache. So I think we can rule out storing *only* virtual. – Peter Cordes Apr 13 '20 at 16:07
  • 2
    Also that would make store-forwarding correctness hard for cases where the same phys page is accessed via two virt addresses. I think x86 guarantees you see your reloads see your own recent stores even in that case. I'm not sure why it would be useful to keep the virtual once you have physical; I don't think store-forwarding can probe only on virtual address, although probing first on virt then again on physical to save latency is plausible. – Peter Cordes Apr 13 '20 at 16:08
  • 1
    @PeterCordes I think you are correct: both the PA and the VA are stored in the SB. [This image](https://patentimages.storage.googleapis.com/74/6a/78/c296c09c49efec/US6378062-drawings-page-9.png) seems to confirm this. The PA is filled after the TLB lookup which is done in parallel with, among others, store forwarding lookup on the lower 12 bits of the address (loose-net check). That's why we have 4K aliasing. I just remembered my question about fallout have these details. – Margaret Bloom Apr 13 '20 at 17:59
  • @MargaretBloom: Note that the low 12 bits of the physical address are also the low 12 of virtual. You don't need to separately store the virtual low 12 in the SB, just check low bits of loads against phys addresses in the SB. But good point about 4k aliasing and the loose-net check happening in parallel with TLB access for loads. – Peter Cordes Apr 13 '20 at 18:06
  • @PeterCordes I think the VA is stored in full length and the PA is missing the lower 12 bits (probably the TLB never deals with those bits). But it's the same, they are just not stored twice. – Margaret Bloom Apr 13 '20 at 18:26
  • @MargaretBloom: Any idea why the VA page bits would be stored at all? Maybe I'm missing something, but I don't see any obvious use for them in the SB. – Peter Cordes Apr 13 '20 at 19:04
  • 2
    @PeterCordes - the [patent](https://patents.google.com/patent/US6378062B1/en) that Margaret linked is pretty clear that the entire VA is stored, but it isn't exactly clear why. One thing it mentions is that the VA is available and stored 2 cycles earlier than the PA, so maybe it's to enable fast store forwarding (i.e., if the VAs match, the store definitely forwards), falling back to a slower path to handle unusual VA aliasing cases. There is another patent where they talk about fine-net and coarse-net stuff (also the store forwarding spectre paper) which probably clarifies. – BeeOnRope Apr 13 '20 at 22:19
  • 1
    That patent also mentions one mechanism for split line stores: _Additionally, if a store instruction involves storing data to memory locations spanning two cache lines, the MEU signals the data cache memory, and the STD and STA operations are driven to the data cache memory twice, with the data size and the physical address being adjusted accordingly the second time._ FWIW this patent is quite old (1997) and refers to an old 32-bit uarch with 12 store buffer entries so things may have changed a lot in the meantime. – BeeOnRope Apr 13 '20 at 22:20
  • @PeterCordes I think it enables fast store forwarding, it is used in the check algorithm in my fallout question (where the check is made in three steps: lower 12 bits, upper VA, upper PA). – Margaret Bloom Apr 14 '20 at 10:31
  • @Margaret: Ok, `invlpg` is serializing, so yes I guess VA can be sufficient to detect store-forwarding if we require OSes to use it carefully. (Presumably x86 doesn't guarantee what happens if you modify a PTE without doing `invlpg`. A TLB entry could be evicted and replaced while a store was still in flight, leading to spurious store forwarding for a load, making it effectively access the old physical page, even if it runs after loads that access non-forwarded data from the new physical page.) – Peter Cordes Apr 14 '20 at 16:13
  • I wondered if the Core 2 TLB-handling "errata" kerfuffle (https://www.realworldtech.com/forum/?threadid=78469&curpostid=78455 / https://www.zdnet.com/article/linus-contradicts-openbsd-founder-on-intel-tlb-issue/) was related to this possibly-surprising effect, but IDK the details of that. – Peter Cordes Apr 14 '20 at 16:17

0 Answers0