3

Assume that an x86(x64) CPU is executing atomic instructions (e.g., lock cmpxchg) in a certain memory region. At the same time some device is also performing DMA reads from the same memory region without letting the CPU know. Now do the atomic operations executed by the CPU also show atomic effects to the DMA-ing device?

I think the DMA-ing device should also observe atomic changes, because of bus locking or x86's cache coherency, but I am not sure with this.

Thanks in advance!

IcicleF
  • 61
  • 5
  • 1
    The question boils down to what size "atoms" (chunks) DMA effectively reads in. The store or RMW is definitely atomic wrt. all possible observers in the system, that's what `lock` guarantees. (Same guarantee even for [an aligned store of 8 bytes or less](https://stackoverflow.com/questions/36624881/why-is-integer-assignment-on-a-naturally-aligned-variable-atomic-on-x86/36685056#36685056)). So any tearing is down to the observer. – Peter Cordes Dec 03 '21 at 15:05
  • @PeterCordes: Thanks! I have no doubt that the DMA can read multiple 8-bytes that are not modified as a whole. I just want to confirm that DMA does not observe partial 8-byte writes. So from Intel Manual Vol. 3 8.1.2, does CPU take exclusive control of a cache line when accessing it with `lock`-ed instruction? – IcicleF Dec 03 '21 at 15:10
  • 1
    Yes. There are no partial 8-byte writes happening anywhere in the first place that could be observed when you use aligned `mov [rdi], rcx`, or `lock cmpxchg [rdi], rcx` for example (unless it's misaligned, in which case it's extremely slow in order to still guarantee atomicity, but then observers could only be sure they read both parts at once by themselves using `lock`ed operations, not DMA.) – Peter Cordes Dec 03 '21 at 15:12
  • @PeterCordes: Thanks a lot! That really helps. – IcicleF Dec 03 '21 at 15:14
  • (I edited my previous comment a bit, ping in case you didn't see the last edit.) – Peter Cordes Dec 03 '21 at 15:15
  • @PeterCordes: Thanks for the edit. Does this mean misaligned 8-byte writes are not guaranteed to be atomic to DMA? – IcicleF Dec 03 '21 at 15:18
  • 1
    Misaligned 8-byte pure-stores (e.g. `mov`) aren't guaranteed atomic at all, regardless of reader. Misaligned `lock`ed RMWs are atomic, but DMA wouldn't have a guaranteed-atomic way to read all of it, unless DMA read atomicity chunk size is wider than CPU read/write guarantees. I wouldn't be surprised if *in practice* on some CPUs, it's actually 32 or 64 bytes, so misalignment wouldn't cause problems unless split across a cache-line boundary. – Peter Cordes Dec 03 '21 at 15:37
  • 1
    as far as I know the `lock` instruction will block every agent (including DMA agents). Today the CPU controls the memory access through the system agent and this will act as the quiescence master during a lock. Earlier there was a lock signal to block the "entire bus". This as long as the accessed data is not cached in E/M state (and is a single line, ie not crossing a 64B boundary). In this case, IIRC, the PCI traffic is coherent so the usual dealy-the-snoop workflow applies. `lock`ed accesses are always atomic, notmatter the alignment – Margaret Bloom Dec 04 '21 at 14:52

0 Answers0