All of this is intended to refer to Linux, kernel version 3.13 if it matters, in case there are behaviors that differ between Posixen - although if someone knows the situation for other variants it would be interesting.
My present understanding is that:
Posix read(2) and write(2) calls are atomic with respect to each other (this is mandated by the Posix standard). If I read() some bytes simultaneous with write()ing at that position, I will see either all or none. EDIT: See comments, for many file systems it is only atomic by page.
write(2) calls are atomic with respect to mmap - if I write() to some bytes, and simultaneously read the buffer via mmap, I will see either all of the write or none (I believe this is NOT strictly mandated by Posix, but is an artifact of the way Linux and many other OSes manage the page cache, and is only true for writes that hit one page).
Mmap writes are not guaranteed to be atomic with respect to anything - other readers may see partial writes, and other writers in the same section may intermingle. In practical terms there may be a minimal atomic size, but I do not know what this is or how to guarantee it. Does anyone have any insight on this?
If I do a CPU CAS on a memory location in an mmap'd buffer, it will "do what I want" as far as actually having CAS semantics, and any successful write as a result is guaranteed to be atomically visible / invisible to other readers (whether via mmap or read()), as long as I maintain alignment restrictions mandated by the CPU.
Do I have this straight, and are there implementations or documentation I can look at to get more insight into these interactions?