3

The memory model of x86 processors guarantee that writes become visible to other cores in the order they were made, i.e. as if there is a store queue.

In my practice, this holds for memory-mapped file persistence too, which greatly simplifies high-performance database implementations, among other things (because it's trivial to read a truncated log, unlike the one which has been randomly corrupted)

The memory model for ARM makes no such guarantees.

Does this mean applications are required to always explicitly (and synchronously) flush memory-mapped files on ARM, e.g. with fsync?

UPD: This assumes a concurrently-executing reader accessing the same memory-mapped file, simple, bare-metal local volumes and that the reader, OS and hardware keeps running, only the writer might have been hung or crashed

wizzard0
  • 1,883
  • 1
  • 15
  • 38
  • Pretty sure you just got lucky; I don't think the kernel knows what order pages were dirtied in when it's selecting which order to do write-back to disk. If you update a couple bytes in multiple pages, a crash on x86 could have written-back the last one to disk but not the first. (True especially for software-driven write-back, but also for hardware persistent memory like Optane DC PM.) – Peter Cordes Sep 21 '21 at 11:20
  • Or do you mean wrt. readers that use `open`/`read` seeing the data? Yes *that* should respect the memory model, so you can use `std::memory_order_release` to get ordered stores even on ARM. – Peter Cordes Sep 21 '21 at 11:20
  • Yes, sorry, I missed the part that the readers are running concurrently. I agree there's no guarantee the write-back (actually, page-out) happens in the write order. Though the individual *pages* should still be correct on x86 but require a barrier on ARM, no? – wizzard0 Sep 21 '21 at 11:23
  • Um, could you please clarify to which statement do you refer with "no, see my first comment"? – wizzard0 Sep 21 '21 at 11:28
  • re: update: oh, so your readers have also used `mmap` on the same file? So this is just plain old shared memory. The fact that it happens to be backed by a disk file is totally irrelevant to ordering between two running processes that both have those pages mapped; use `std::atomic *ptr` or whatever in C++ with `std::memory_order_release` and `mo_acquire`, or in assembly use `stlr` / `ldar` in ARMv8 assembly to do release / acquire stores. (std::atomic works for types where `is_always_lock_free` is true, otherwise not because separate processes will have their own hash table of locks) – Peter Cordes Sep 21 '21 at 11:30
  • `fsync` is relevant only for persistence, although it probably ends up being a full barrier if you do it between two stores! – Peter Cordes Sep 21 '21 at 11:34
  • Sorry, I misread your first comment, deleted my re-statement of my first comment you were replying to. You did agree there wasn't ordering between separate pages, and were just talking about ordering *within* individual pages. Yes, it's plausible on ARM that a DMA read (by a disk controller) could see stores having become visible out of order. – Peter Cordes Sep 21 '21 at 11:36
  • But actually, even on x86, a DMA read doesn't atomically read a whole page at once, so a writer thread that did `page[0] = 1; page[4095] = 1;` could end up committed to disk with just the `page[4095]` store visible but not the `page[0]` store. e.g. if the DMA read of the first byte of the page had already happened, then both those stores become globally visible right after each other, then the DMA read of the last byte of the page sees that 2nd store. – Peter Cordes Sep 21 '21 at 11:39
  • 1
    re:fsync: yep I might be mixing up things here, thanks for pointing it out :) so TLDR would be "within a page, it's never (prefix-)consistent for mmap>fread, always consistent on x86 for mmap>mmap, consistent everywhere for mmap>mmap with atomics" right? – wizzard0 Sep 21 '21 at 11:40
  • 1
    Consistent everywhere with atomics: not if you use `std::memory_order_relaxed` for pure-loads and pure-stores. But yes, with atomics, you can get the necessary acquire/release synchronization. (Atomics are necessary to safely/correctly use shared memory with an optimizing compiler; [don't roll your own with `volatile`](https://stackoverflow.com/questions/4557979/when-to-use-volatile-with-multi-threading/58535118#58535118), and if you don't use either things will break. https://lwn.net/Articles/793253/) – Peter Cordes Sep 21 '21 at 11:44
  • So you edited your question to talk about two processes mmapping the file, but your title still says "persisted to disk". Do you care about that or not? (Many databases do, for recovery after a crash.) – Peter Cordes Sep 21 '21 at 11:46
  • I do care about persistence, but I believe this is more like an another question as per SO guidelines :) edited the title to be more specific. Thanks a lot! – wizzard0 Sep 21 '21 at 11:49
  • So then as Peter says, your title and body are asking orthogonal questions. `fsync()` has nothing to do with what another running process on the system will observe, except that it might incidentally happen to act as a memory barrier. If you care about the order in which another process observes your stores, you need a memory barrier (not fsync). If you just care that all the stores become visible reasonably soon, and don't care about the order, then you don't have to do anything except ensure the compiler actually executes the stores. – Nate Eldredge Sep 21 '21 at 14:38
  • 1
    Btw, are you more interested in ARM32 or ARM64? I don't think the overall answer is different between them, but it might help people give examples that are more relevant to you. – Nate Eldredge Sep 21 '21 at 17:14

1 Answers1

0

this is what ldrex/strex are for, synchronizing between cores.

old_timer
  • 69,149
  • 8
  • 89
  • 168