2

There is already a lot of information on software and hardware memory models, memory fences, store/load reordering etc. However, it all seems to focus on guaranteeing the relative ordering of reads and writes to and from shared memory.

Would it be legal behavior for such a system to delay the writes of a thread altogether for a potentially long time?

For example, consider a thread that does some updates to a data structure in memory and then raises a flag that is supposed to notify other threads of the update:

(dataWritten is initially false)
store value1
store value2
store value3
mfence
store dataWritten (true)

According to most memory models I've read about, the memory barrier guarantees that any other thread cannot observe dataWritten as true, while still reading stale values 1, 2 or 3, i.e. it makes these writes atomic.

But can I be sure that the writes will be seen at all? Would it be legal under the memory model to delay the writes indefinitely, as long as the flag isn't written sooner than the values?

In database terms, can memory models be used to reason about durability (in addition to atomicity and consistency, which can be guaranteed by using memory fences and flags as in the example above)?

Update: Detailed semantics of volatile regarding timeliness of visibility addresses the same topic in the context of the Java Memory Model, and Memory model ordering and visibility? for C++11. Does that discussion apply to hardware memory models as well, i.e. do CPU ISAs only give hard guarantees for correct visibility sequence, but "soft" guarantees for delayed visibility?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
lxgr
  • 3,719
  • 7
  • 31
  • 46
  • *i.e. it makes these writes atomic.* - Atomicity is separate from ordering; it means lack of tearing; you either see the stored value or you see a previous value, not some bytes from each. Barriers can't create atomicity. And BTW, `mfence` is an x86 instruction, but the x86 memory model already guarantees store order. (specifically, x86 memory ordering is program-order plus a store buffer with store-forwarding.) – Peter Cordes Jan 05 '22 at 14:40

1 Answers1

4

It's difficult to prove a negative--there are a lot of instruction set architectures out there. However, I suspect you're right that no hardware memory model makes any guarantees about eventual visibility for writes.

I highly recommend reading A Formal Specification of Intel® Itanium® Processor Family Memory Ordering, because although you probably don't care about Itanium, it's an excellent and readable description of what guarantees hardware memory models typically are concerned with.

As a practical matter, as long as a CPU is still executing instructions, it will have to flush its writes eventually. Further, as long as the write has hit L2 cache or thereabouts, it should normally be visible to other CPUs due to cache coherence protocols. So I don't think this is something to be especially worried about.

Jamey Sharp
  • 8,363
  • 2
  • 29
  • 42
  • MESI cache coherency means that commit from the store buffer to L1d cache is the point of global visibility. That can't happen until other copies are invalidated. CPUs always drain their store buffer ASAP because it's not a cache at all, it's a queue that has to make room for future stores. See [Does hardware memory barrier make visibility of atomic operations faster in addition to providing necessary guarantees?](https://stackoverflow.com/q/61591287) / [If I don't use fences, how long could it take a core to see another core's writes?](https://stackoverflow.com/q/51292687) – Peter Cordes Jan 05 '22 at 14:44