Behavior of mprotect with multiple threads

Question

For the purpose of concurrent/parallel GC, I'm interested in what memory order guarantee is provided by the mprotect syscall (i.e. the behavior of mprotect with multiple threads or the memory model of mprotect). My questions are (assuming no compilier reordering or with sufficient compiler barrier)

If thread 1 triggers a segfault on an address due to a mprotect on thread 2, can I be sure that everything happens on thread 2 before the syscall can be observed in thread 1 in the signal handler of the segfault? What if a full memory barrier is placed in the signal handler before performing load on thread1?
If thread 1 does an volatile load on an address that is set to PROT_NONE by thread 2 and didn't trigger a segfault, is this enough of a happens before relation between the two. Or in another word, if the two threads do (*ga starts as 0, p is a page aligned address started readonly)
```
// thread 1
*ga = 1;
*(volatile int*)p; // no segfault happens

// thread 2
mprotect(p, 4096, PROT_NONE); // Or replace 4096 by the real userspace-visible page size
a = *ga;
```
is there a guarantee that a on thread 2 will be 1? (assuming no segfault observed on thread 1 and no other code modifies *ga)

I'm mostly interested in Linux behavior and particularly on x86(_64), arm/aarch64 and ppc though information about other archs/OS are welcome to (for windows, replace mprotect by VirtualProtect or whatever it is called....). So far my tests on x64 and aarch64 Linux suggests no violations of these though I'm not sure if my test is conclusive or if the behavior can be relied on in the long term.

Some searching suggests that mprotect may issue a TLB shootdown on all threads with the address mapped when permission is removed which might provide the guarantee stated here (or in another word, providing this guarantee seems to be the goal of such operation) though it's unclear to me if future optimization of the kernel code could break this guarentee.

Ref LKML post where I asked about this a week ago with no reply yet...

Edit: clearification about the question. I was aware that a tlb shootdown should provide the guarantee I'm looking for but I'd like to know if such a behavior can be relied on. In another word, what's the reason such requests are issued by the kernel since it shouldn't be needed if not for providing some kind of ordering guarantee.

What do you mean by *"before the syscall can be observed"*? Observed how? Do you mean the effect of the syscall? If we look at it from the kernel side, the syscall is a request for changing the virtual memory attributes, the effects of which will propagate through the translation buffers, and if more than one userspace thread is running in different cores (or different CPUs depending on the architecture), it is possible they see the effect at different physical moments, unless a flush is done. In Linux, the TLB flush is done, unless the old attributes disallowed access. [...] — Nominal Animal, May 03 '17 at 22:58
[...] This means that the behaviour differs depending on whether the syscall allows (`PROT_NONE` to `PROT_READ`) or disallows (`PROT_READ` to `PROT_NONE`) access. In Linux, the TLB flush is omitted in the former case, so concurrently running threads may see the change at different times. In the latter case, the TLB flush is done, so the threads should observe the change simultaneously (at the moment of TLB flush) -- although I am not sure if there is hardware where the TLB flush on separate CPU (packages!) is non-simultaneous. — Nominal Animal, May 03 '17 at 23:03
I suspect that the reason for no replies on LKLM is twofold: One, this sounds abstract, and kernel devs are pragmatic; most have enough interesting real-world issues to work with to not care much about abstract questions with no real-world implications. Two, it is difficult to understand what is the exact scenario here. A real world use case -- "does the mm logic quarantee this scenario works?" -- with ASCII art diagram showing what happens when in simultaneous threads would help a lot. — Nominal Animal, May 03 '17 at 23:06
@NominalAnimal Observing a syscall means that a memory operation on the page where the permission was flipped faults (or not) due to the mprotect call. Re TLB flush, yeah, I know it is done on linux though my questions is more about what behavior can I rely on or in another word what guarantee is it trying to maintain by issuing such flush. Re: A real world use case, I was hopping the pseudo code in case 2 is enough to start, seems not =( — yuyichao, May 04 '17 at 01:16

score 4 · Accepted Answer · answered May 04 '17 at 01:34

4

So I asked this on the mechanical-sympathy group a day after posting here and got an answer from Gil Tene. With his permission here's my summary of his answers. The full thread is available here in case there's anything I didn't include that isn't clear.

For the overall behavior one can expect from the OS.

(as in "would be surprising for an OS to not meet):

A call to mprotect() is fully ordered with respect to loads and stores that happen before and after the call. This tends to be trivially achieved at the CPU and OS level because mprotect is a system call, which involves a trap, which in turn involves full ordering. [In strange no-ring-transition-implementations (e.g. in-kernel execution, etc.) the protect call would be presumably responsible for emulating this ordering assumption].

A call to mprotect will not return before the protection request semantically takes hold everywhere within the process. If the mprotect() call sets a protection that would cause a fault, any operation on any thread that happens after this mprotect() call is required to fault. Similarly, if the mprotect() call sets a protection that would prevent a fault, any operation on any thread that happens after this mprotect() call is required to NOT fault.

This essentially means that the memory operation on the affected pages on other threads are synchronized with the thread calling mprotect. More specifically, one can expect both of the two cases mentioned in the original question are guaranteed. I.e.

If it is observed that a load on one thread in the affected page faults due to the mprotect call, this fault happens after mprotect() call and therefore after and is able to observer all memory operations that happens before mprotect.
If it is observed that a load on one thread in the affected page doesn't fault disbite the mprotect call, the load happens before mprotect call and the mprotect call and any code after it are after the load and will be able to observe any memory operations that happens before the load.

It was also pointed out that transitivity may not work, i.e. a fault load on one thread may not be after a non-fault load on another thread. This can (effectively) be caused by the non-atomicity of the tlb flush causing different threads/cpus to observer the change in access permission at different times.

answered May 04 '17 at 01:34

yuyichao

768
6
28

1

I'm actually not so sure about the number 2 in your quote. It is a tautology. "Anything that happens after you observed X happens after you observe X". I mean the bit "after this mprotect call" actually should say "after this thread has observed the effect of the mprotect call". The problems with ordering of some memory operations is that "before" and "after" are only meaningful when you have memory barriers or explicitly ordering instructions. I think what saves us is that TLB shootdown has to be a memory barrier (I could not imagine working on an architecture where it isn't). – Art May 05 '17 at 12:19
1

But your actual question most likely has no answer: I've never seen a formal memory model of system calls. I think the assumption is "of course they are synchronizing, it would be impossible to get anything done without that", but officially this is most likely undefined. Even on the craziest memory models I've worked with system calls didn't return until everything was in sync (I have written code where we didn't wait for the tlb to be fully in sync until the last instructions before returning to userland, but it was too dangerous and didn't buy much). – Art May 05 '17 at 12:27
`"before" and "after" are only meaningful when ...` -- exactly, that's why when you say A on thread 1 happens before B happens on thread 2, that implies a synchronization. It is exactly whether this is implied when a mprotect call is made that I'm asking about. `Even on the craziest memory models I've worked with system calls didn't return until everything was in sync` -- and this is exactly what I'm asking about since I have very limited experience about non-x86/multi socket systems. And the question also include the precise def of "in sync" for mprotect. – yuyichao May 06 '17 at 00:09
So another (less-formal) way to ask the question is: "is sth happens before/after a mprotect on another thread" meaningful or "what does/should the OS keep in-sync after a mprotect syscall". – yuyichao May 06 '17 at 00:11
I've had this question at the back of my head during this weekend. Ignore syscalls completely, I think syscalls are a red herring here. Another way of looking at this is: are traps synchronizing? If your CPU has a store buffer that saves writes to different cache lines/pages and one write traps? Will they be flushed in the order they were issued? What if the store buffer is write-combining? The whole point of store buffers is to reorder writes. Will the store buffer drain if one of its stores traps? What if that draining causes another trap? Which trap should be delivered first? – Art May 08 '17 at 09:02
This all means that you are outside of what's defined by the memory model of your threads implementation. So I would answer this: "Since you didn't synchronize like pthreads told you, the behavior is undefined." – Art May 08 '17 at 09:05
Pthread is irrelevant here, it doesn't even include anything about page protection or segfault. This is obviously not going to have a defined behavior under pthread and that's not it's job nor is it what I'm asking. Pthread/C standard is not the only standard/guarantee out there. FWIW, I explicitly mentioned that the pseudocode assumes no compiler reordering and therefore it's basically a easier way to write asm and in practice the code will be jitted and isn't even related to C. – yuyichao May 08 '17 at 13:28
You are right in ignoring the compiler here, it's not where the problem is. On anything not x86 (and also x86 to some degree, but it's much simpler there), you'll be dealing with the CPU reordering and combining writes. The reason why I mention pthreads (or any other threads implementation) is because that's pretty much the only level where operating systems have to have clearly defined memory ordering rules (also, C11 stdatomic.h). Anything beyond that is pretty much undefined. – Art May 09 '17 at 09:23
"the only level where operating systems have to have clearly defined memory ordering rules" This is clearly not true. If that's the case, no TLB shootdown would be necessary in mprotect since the effect will eventually propagate there. – yuyichao May 09 '17 at 12:58
I've actually worked on TLB shootdown on multiple architectures in a real operating system. Memory operation ordering was not on my mind. At all. Security and process isolation was. We didn't have any memory ordering rules other than "don't break pthreads" and "get stuff into RAM before DMA" and those were implicit, not formally defined. – Art May 10 '17 at 06:12

Behavior of mprotect with multiple threads

1 Answers1