4

All of this is intended to refer to Linux, kernel version 3.13 if it matters, in case there are behaviors that differ between Posixen - although if someone knows the situation for other variants it would be interesting.

My present understanding is that:

  1. Posix read(2) and write(2) calls are atomic with respect to each other (this is mandated by the Posix standard). If I read() some bytes simultaneous with write()ing at that position, I will see either all or none. EDIT: See comments, for many file systems it is only atomic by page.

  2. write(2) calls are atomic with respect to mmap - if I write() to some bytes, and simultaneously read the buffer via mmap, I will see either all of the write or none (I believe this is NOT strictly mandated by Posix, but is an artifact of the way Linux and many other OSes manage the page cache, and is only true for writes that hit one page).

  3. Mmap writes are not guaranteed to be atomic with respect to anything - other readers may see partial writes, and other writers in the same section may intermingle. In practical terms there may be a minimal atomic size, but I do not know what this is or how to guarantee it. Does anyone have any insight on this?

  4. If I do a CPU CAS on a memory location in an mmap'd buffer, it will "do what I want" as far as actually having CAS semantics, and any successful write as a result is guaranteed to be atomically visible / invisible to other readers (whether via mmap or read()), as long as I maintain alignment restrictions mandated by the CPU.

Do I have this straight, and are there implementations or documentation I can look at to get more insight into these interactions?

Bryce
  • 2,157
  • 17
  • 16
  • 2
    I'm not sure that `write(2)` more than one page (i.e. more than 4Kbytes) is atomic w.r.t. `mmap(2)` – Basile Starynkevitch May 27 '14 at 18:56
  • @BasileStarynkevitch I think you're correct. I'm looking at [this](http://stackoverflow.com/a/10651090/812321) and it's indicating that on many file systems the full POSIX guarantees aren't present and writes are only atomic by page. – Bryce May 27 '14 at 19:17

1 Answers1

1

According to POSIX rationale:

I/O is intended to be atomic to ordinary files and pipes and FIFOs. (...) The behavior for other device types is also left unspecified, but the wording is intended to imply that future standards might choose to specify atomicity (or not).

So atomicity doesn't seem to be guaranteed by POSIX on special files, for example, mmaping /dev/zero.

Applications need to know how large a write request can be expected to be performed atomically. This maximum is called {PIPE_BUF}.

Even on regular files, atomic writes are only guaranteed by POSIX if the write is for less than PIPE_BUF bytes.

From linux man pages:

However, on Linux before version 3.14, this was not the case: if two processes that share an open file descriptor (see open(2)) perform a write() (or writev(2)) at the same time, then the I/O operations were not atomic with respect updating the file offset, with the result that the blocks of data output by the two processes might (incorrectly) overlap. This problem was fixed in Linux 3.14.

The same, incidentally, is true for read().

I couldn't find anything confirming 2). POSIX does not require this:

The application must ensure correct synchronization when using mmap() in conjunction with any other file access method, such as read() and write(), standard input/output, and shmat().

POSIX says so for 3), makes sense because it would require knowing object sizes when writing to memory locations.

If you do atomic compare and swaps, then individual words will be atomically updated, which does not mean that the whole write will be atomic, unless you use the CAS operations to build more complex operations like locks or transactional memory.

http://man7.org/linux/man-pages/man2/write.2.html

http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html

http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html

http://pubs.opengroup.org/onlinepubs/9699919799/functions/mmap.html

kiko
  • 180
  • 1
  • 8
hdante
  • 7,685
  • 3
  • 31
  • 36