How compiler like GCC implement acquire/release semantics for std::mutex

Question

My understanding is that std::mutex lock and unlock have a acquire/release semantics which will prevent instructions between them from being moved outside.

So acquire/release should disable both compiler and CPU reorder instructions.

My question is that I take a look at GCC5.1 code base and don't see anything special in std::mutex::lock/unlock to prevent compiler reordering codes.

I find a potential answer in does-pthread-mutex-lock-have-happens-before-semantics which indicates a mail that says a external function call act as compiler memory fences.

Is it always true? And where is the standard?

No, functions, external or otherwise, have nothing to do with memory fences. — n. m. could be an AI, Jun 07 '16 at 17:42
It'd help your question to post a concrete piece of code that you think might have problems. Then we can use the gcc generated assembly to help explain how gcc handles it — M.M, Jun 08 '16 at 09:51
@n.m. How can a truly extern function (the opposite of inline) not function as a compiler fence? — curiousguy, Jun 02 '18 at 16:27
@curiousguy Memory fences are hardware things. There are assembly instructions that create memory fences. They have nothing to do with the compiler doing or not doing reordering. — n. m. could be an AI, Jun 02 '18 at 17:00
@n.m. No, memory fences are asm level things, not hardware. Of course they have everything to do with the compiler. How do you believe they are produced by the compiler? — curiousguy, Jun 02 '18 at 17:02
@curiousguy Memory fences are CPU instructions, for example for x86 it's MFENCE (opcode 0f ae f0). Compilers normally don't produce these instructions, they are needed in a handful of synchronisation functions like pthread_mutex_lock() where library authors insert them by hand. — n. m. could be an AI, Jun 02 '18 at 17:22
@n.m. These instructions are either produced in compiled code by some means, or written in asm functions; either way, that code needs to be callable from compiled code. The compiler knows how to call truly extern code, like functions written in assembly or compiled by other compilers. How can calling a truly extern function, like one written is asm, not be barrier for the compiler? — curiousguy, Jun 02 '18 at 17:31
(...) The only a compiler could treat call to compiled binary code, or asm source files, as "extern inline" and try to inline it and reorder code, would be to translate binary code or asm source to an abstract representation of low level code, and then use the known semantic of that representation to reorder "stuff" (either low level stuff or more intermediary level stuff). — curiousguy, Jun 02 '18 at 17:40
(...) The C/C++/asm/binary compiler would have to know the entire instruction set and respect its semantics (and then it wouldn't reorder around memory fences, by definition), or only part of it (say, everything except instructions that do memory fences), and treat unrecognized part as opaque stuff (and then it wouldn't mess around memory fences, by ignorance). — curiousguy, Jun 02 '18 at 17:41
@curiousguy Let's try again. A memory fence is a very specific thing, the MFENCE instruction (for x86). Nothing more, nothing less. That's all I wanted to say. I haven't tried to discuss what kind of stuff compilers can or cannot reorder around what kind of function, I just tried to clarify the meaning of the term. If you want to discuss compilers reordering stuff, use a different terminology to avoid confusion. I hear "compiler reorder barrier" here and there. — n. m. could be an AI, Jun 02 '18 at 18:32
@n.m. 1) I was told that atomic instructions also count as 'memory fence'. 2) The compiler must provide a way to either emit them inside a C/C++ function or call code in asm. Either way, there is no reordering by the compiler around them. — curiousguy, Jun 02 '18 at 18:43
@curiousguy 1) on x86 it seems to be the case, but this is arch specific. 2) again, I'm not trying to say what compilers can or cannot do, I'm trying to explain what is meant by the term "memory fence", that's all. — n. m. could be an AI, Jun 02 '18 at 20:33

score 14 · Answer 1 · edited May 23 '17 at 12:10

Threads are a fairly complicated, low-level feature. Historically, there was no standard C thread functionality, and instead it was done differently on different OS's. Today there is mainly the POSIX threads standard, which has been implemented in Linux and BSD, and now by extension OS X, and there are Windows threads, starting with Win32 and on. Potentially, there could be other systems besides these.

GCC doesn't directly contain a POSIX threads implementation, instead it may be a client of libpthread on a linux system. When you build GCC from source, you have to configure and build separately a number of ancillary libraries, supporting things like big numbers and threads. That is the point at which you select how threading will be done. If you do it the standard way on linux, you will have an implementation of std::thread in terms of pthreads.

On windows, starting with MSVC C++11 compliance, the MSVC devs implemented std::thread in terms of the native windows threads interface.

It's the OS's job to ensure that the concurrency locks provided by their API actually works -- std::thread is meant to be a cross-platform interface to such a primitive.

The situation may be more complicated for more exotic platforms / cross-compiling etc. For instance, in MinGW project (gcc for windows) -- historically, you have the option to build MinGW gcc using either a port of pthreads to windows, or using a native win32 based threading model. If you don't configure this when you build, you may end up with a C++11 compiler which doesn't support std::thread or std::mutex. See this question for more details. MinGW error: ‘thread’ is not a member of ‘std’

Now, to answer your question more directly. When a mutex is engaged, at the lowest level, this involves some call into libpthreads or some win32 API.

pthread_lock_mutex();
do_some_stuff();
pthread_unlock_mutex();

(The pthread_lock_mutex and pthread_unlock_mutex correspond to the implementations of lock and unlock of std::mutex on your platform, and in idiomatic C++11 code, these are in turn called in the ctor and dtor of std::unique_lock for instance if you are using that.)

Generally, the optimizer cannot reorder these unless it is sure that pthread_lock_mutex() has no side-effects that can change the observable behavior of do_some_stuff().

To my knowledge, the mechanism the compiler has for doing this is ultimately the same as what it uses for estimating the potential side-effects of calls to any other external library.

If there is some resource

int resource;

which is in contention among various threads, it means that there is some function body

void compete_for_resource();

and a function pointer to this is at some earlier point passed to pthread_create... in your program in order to initiate another thread. (This would presumably be in the implementation of the ctor of std::thread.) At this point, the compiler can see that any call into libpthread can potentially call compete_for_resource and touch any memory that that function touches. (From the compiler's point of view libpthread is a black box -- it is some .dll / .so and it can't make assumptions about what exactly it does.)

In particular, the call pthread_lock_mutex(); potentially has side-effects for resource, so it cannot be re-ordered against do_some_stuff().

If you never actually spawn any other threads, then to my knowledge, do_some_stuff(); could be reordered outside of the mutex lock. Since, then libpthread doesn't have any access to resource, it's just a private variable in your source and isn't shared with the external library even indirectly, and the compiler can see that.

No, lock and unlock are not bound to the ctor/dtor of std::mutex. — akim, Jun 07 '16 at 16:34
Right, I have corrected the answer, thanks for pointing out the mistake. — Chris Beck, Jun 07 '16 at 16:41
_There is no standard C thread functionality_ That's [no longer true](http://en.cppreference.com/w/c/thread). — Sean Cline, Jun 07 '16 at 16:58
What about link time optimization? Cannot gcc then peek into `libpthread`? — , Jun 07 '16 at 19:03
@Hurkyl That is not yet supported by either GCC or any C library I know about. If it ever happens, there *may* need to be special annotations on various pthread functions, informing the compiler of their semantics ... or the compiler might grow clever enough to infer their semantics from the atomic primitives in their implementations. — zwol, Jun 07 '16 at 20:40
@Hurkyl: That's a good question. The kind of analysis I'm talking about is called *reachability analysis*, it's generally done by the compiler when you have an AST and "know" all of the objects in the program, and can see which ones refer to eachother. After emitting object code, almost every pointer dereference becomes opaque -- you would have to anticipate the runtime value of some register, which may be some value returned by malloc and then pointer arithmetic etc. etc... So I don't think link-time reachability analysis is feasible. (continued) — Chris Beck, Jun 07 '16 at 21:33
(conitnued) Even if hypothetically all that was not opaque, reachability analysis is *conservative*. In any sequential model of computation, basically the idea is, give a *conservative* estimate of all of the memory read / written by any given operation. If one operation reads memory that the other writes, then they cannot be reordered. And if not, then generally they can. (You may need to let "memory" more generally include processor state if you look at e.g. a single chip, but it's some details.) In pthreads, there should be a clear chain of reachability though. (continued) — Chris Beck, Jun 07 '16 at 21:36
(continued). The thread potentially touches the resource -- it also potentially touches / reads the mutex. So touching the mutex may implicitly touch the resource. Calling `lock_mutex` also touches the mutex. So if the reachability analysis concludes these can be reordered I would say its a bug. I mean, clearly it is a bug, since reordering them does in fact give different results. Regardless of how the compiler understands the OS primitives, either they should be basically opaque to it, and so it assumes the worst and finds reachability, or they aren't opaque, and it sees the clear chain. — Chris Beck, Jun 07 '16 at 21:39
@TavianBarnes Sure, LTO is supported *in general*, but LTO *into the C library* (libpthread counts as "the C library") is not supported yet. https://sourceware.org/ml/libc-alpha/2016-02/msg00052.html — zwol, Jun 07 '16 at 22:09
@zwol Ah I see. Thanks for the link btw! I had seen https://sourceware.org/bugzilla/show_bug.cgi?id=19649#c1 before but yours has considerably more detail. — Tavian Barnes, Jun 07 '16 at 23:08
Even _if_ LTO would be added to most C libraries, keeping `pthreads` working would be a good reason to not enable LTO for `libpthread`. — MSalters, Jun 08 '16 at 14:19
@ChrisBeck "_you would have to anticipate the runtime value of some register, which may be some value returned by malloc_" You really never consider the actual runtime numerical value of a pointer. It doesn't matter "where" objects are location. `malloc` returns fresh memory, and its return value doesn't alias any existing object. The compiler can know a lot of things w/o knowing everything. — curiousguy, Jan 20 '20 at 22:58

Cort Ammon · Accepted Answer · 2016-06-08T14:13:28.760

5

All of these questions stem from the rules for compiler reordering. One of the fundamental rules for reordering is that the compiler must prove that the reorder does not change the result of the program. In the case of std::mutex, the exact meaning of that phrase is specified in a block of about 10 pages of legaleese, but the general intuitive sense of "doesn't change the result of the program" holds. If you had a guarantee about which operation came first, according to the specification, no compiler is allowed to reorder in a way which violates that guarantee.

This is why people often claim that a "function call acts as a memory barrier." If the compiler cannot deep-inspect the function, it cannot prove that the function didn't have a hidden barrier or atomic operation inside of it, thus it must treat that function as though it was a barrier.

There is, of course, the case where the compiler can inspect the function, such as the case of inline functions or link time optimizations. In these cases, one cannot rely on a function call to act as a barrier, because the compiler may indeed have enough information to prove the rewrite behaves the same as the original.

In the case of mutexes, even such advanced optimization cannot take place. The only way to reorder around the mutex lock/unlock function calls is to have deep-inspected the functions and proven there are no barriers or atomic operations to deal with. If it can't inspect every sub-call and sub-sub-call of that lock/unlock function, it can't prove it is safe to reorder. If it indeed can do this inspection, it would see that every mutex implementation contains something which cannot be reordered around (indeed, this is part of the definition of a valid mutex implementation). Thus, even in that extreme case, the compiler is still forbidden from optimizing.

EDIT: For completeness, I would like to point out that these rules were introduced in C++11. C++98 and C++03 reordering rules only prohibited changes that affected the result of the current thread. Such a guarantee is not strong enough to develop multithreading primitives like mutexes.

To deal with this, multithreading APIs like pthreads developed their own rules. from the Pthreads specification section 4.11:

Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread of control can read or modify a memory location while another thread of control may be modifying it. Such access is restricted using functions that synchronize thread execution and also synchronize memory with respect to other threads. The following functions synchronize memory with respect to other threads

It then lists a few dozen functions which synchronize memory, including pthread_mutex_lock and pthread_mutex_unlock.

A compiler which wishes to support the pthreads library must implement something to support this cross-thread memory synchronization, even though the C++ specification didn't say anything about it. Fortunately, any compiler where you want to do multithreading was developed with the recognition that such guarantees are fundamental to all multithreading, so every compiler that supports multithreading has it!

In the case of gcc, it did so without any special notes on the pthreads function calls because gcc would effectively create a barrier around every external function call (because it couldn't prove that no synchronization existed inside that function call). If gcc were to ever change that, they would also have to change their pthreads headers to include any extra verbage needed to mark the pthreads functions as synchronizing memory.

All of that, of course, is compiler specific. There were no standards answers to this question until C++11 came along with its new memory model.

edited Jun 08 '16 at 14:13

answered Jun 07 '16 at 21:15

Cort Ammon

10,221
31
45

1

Looking into pthread code base, I don't see any compiler barriers, and __pthread_mutex_lock is declared as a normal external function. So I guess you're correct. – SSY Jun 08 '16 at 02:00
2

`One of the fundamental rules for reordering is that the compiler must prove that the reorder does not change the result of the program..` This is only partially correct. The only guarantee the compiler makes is that the program would run correctly when run from a `single thread`. – Arunmu Jun 08 '16 at 07:13
@Kane You didn't see the use of atomic increments and decrements ? Those operation do use atomic barriers! – Arunmu Jun 08 '16 at 07:15
1

@Arunmu barriers have two kinds corresponding to compiler and cpu. The atomic barrier you refer to is CPU instruction level. Right? My question is that how mutex.lock tell compiler to disable reorder. I think both mutex.lock and atomic increment/decrement must have CPU instruction memory barrier. – SSY Jun 08 '16 at 07:57
@Kane I have updated my answer with the intended comment. Can't put it here due to length limitation. – Arunmu Jun 08 '16 at 09:11
Arunmu, a correct C/C++11 compiler must consider multiple threads. It is, for example, not allowed to invent any new writes to memory locations, even if it can prove that would be safe in a single-threaded context, as these new writes may violate data-race freedom rules, and result in incorrect program behaviour. – James Greenhalgh Jun 08 '16 at 10:18
2

@Arunmu What you say is true for C++ before C++11. However, Kane tagged C++11 and referenced `std::mutex`, which only existed in C++11 and afterwards. In C++11, the rules regarding reordering were dramatically revamped to support multithreading. Before then, any guarantees of ordering in a multithreaded setting were *compiler specific*. Libraries like pthreads did depend on the presence of compiler specific guarantees to function. Pthreads has a wording that some functions "synchronize," which limits reordering, and compiler had to support that if they wished to use pthreads. – Cort Ammon Jun 08 '16 at 13:43
Fortunately, there's a bit of a circular argument here. It's *impossible* to write multithreaded code without having some sort of reorder guarantee beyond the single-threaded guarantees of C++98 and C++03, so every compiler which permitted multithreading code provided *some* guarantees. – Cort Ammon Jun 08 '16 at 13:44
1

@Kane The compiler must be conservative. If it cannot *prove* that, by the rules of C++, that there are *no* CPU barriers inside a function call, it is obliged to treat that function call as a compiler barrier, just in case. Thus, if you know a priori that a mutex.lock operation does indeed have a CPU memory barrier, you also can know a priori that the compiler can't reorder around it because you know the compiler can't prove a statement that you know to be false. – Cort Ammon Jun 08 '16 at 13:48
@CortAmmon: So you think I'm wrong about my example, what if there are no threads. Consider the following code. `int main() { int resource = 0; pthread_lock_mutex(); ++resource; pthread_unlock_mutex(); }` You think the compiler won't reorder the increment outside of the mutex lock, because the call to external library is a full memory barrier? I think it's not actually true, i think in this case, compiler can see that `pthread` can't modify resource, unless it is doing things that aren't supported by the standard. I don't think that reorder function call requires to deep-inspect a function. – Chris Beck Jun 08 '16 at 16:41
@ChrisBeck If the compiler can *prove* that the reorder doesn't change the result of the [multithreaded] program, then it can do the reorder in C++11. In a Pthreads compliant compiler, however, it cannot, because pthreads has added additional limitations regarding memory reordering which are interntionally unaware of the C/C++ stack model. As far as posix is concerned, all threads *must* have their affairs in order at every synchronization point. In practice, however, even C++11 compilers wont do the reorder, because permitting said reorder causes all sorts of complications with system calls – Cort Ammon Jun 08 '16 at 18:02
@CortAmmon: Okay, but this somewhat contradicts what you wrote in your answer "The only way to reorder around the mutex lock/unlock function calls is to have deep-inspected the functions and proven there are no barriers or atomic operations to deal with." Now you say, if it can prove that resource is not visible it can reorder around it in my case. But in some cases that is easy and does not require deep-inspection of pthread internals. – Chris Beck Jun 08 '16 at 18:02
@ChrisBeck (note: i deleted my last comment and wrote a new one aroundw hen you replied, so the comments may not line up. I did have to back up and realize that there are pathological-but-spec-abiding C++11 compilers that could do what you mention) I suppose you're right. In the corner cases, you *can* construct cases where you can use the visbility rules to skirt around the need for deep inspection. I admit, I was only considering cases where people actually use mutexes - situations where a conflict IS possible. – Cort Ammon Jun 08 '16 at 18:06
@ChrisBeck I couldn't find a link to this, but I *do* remember working on a old platform where there was a define macro you could use to tell pthreads "you may assume this program is single threaded," which reduced all of the pthreads calls to very simple integer operations or even no-ops, and presumably also re-enables a lot of reordering options – Cort Ammon Jun 08 '16 at 18:13
@CortAmmon "_pthreads has added additional limitations regarding memory reordering which are interntionally unaware of the C/C++ stack model_" What is your source on that? That seems very unlikely. It would mean that a normal C/C++ compiler isn't correctly handling pthread functions unless it treats them specially, and I am pretty sure that no compiler does that and wants to do that, ever, and compiler writers would laugh at the idea – curiousguy Jun 02 '18 at 16:43
@CortAmmon "_presumably also re-enables a lot of reordering options_" A reasonable program will probably don't have many useful possible reordering around pthread function calls. – curiousguy Jun 02 '18 at 16:50
@CortAmmon Are you saying that any call to a pthread function makes all non volatile C/C++ objects temporarily volatile for the duration of the call? (and then return to their non volatile state) – curiousguy Jun 02 '18 at 17:48
@ChrisBeck To disallow reordering of writes to such purely local variable (like `resource`) around opaque function calls, you just have to make them volatile. – curiousguy Jun 02 '18 at 18:24
@curiousguy my source is the quoted section of the pthread spec. Pthreads is language agnostic, so would not be written to support special optimization of C++ stack variables. And volatile is a completely different beast all together with very different guarantees. In general, one should not expect usable parallels between volatile and multithreading primitives. – Cort Ammon Jun 02 '18 at 19:10
@CortAmmon Since it is language agnostic, why would it try to prevent these optimisation of stack variables? How would these matter? – curiousguy Jun 02 '18 at 19:48
@curiousguy As written, anything with a memory address needs to be synchronized if it can be accessed by more than one thread. It is possible for one thread to inspect the stack of another, so synchronization points must affect variables on the stack as well. – Cort Ammon Jun 02 '18 at 22:51
@CortAmmon "_if it can be accessed by more than one thread_" exactly; what is the "_the C/C++ stack model_"? – curiousguy Jun 02 '18 at 23:21
@curiousguy I think you have misunderstood me, in my scenario, the resource would "serve a purpose" e.g. count actions or something. I just decided not to write a bunch of extraneous boilerplate for the code example. A conforming compiler would not be able to simply ignore "resource". – Chris Beck Jun 04 '18 at 16:23
@ChrisBeck Of course I assume that you don't *usually* write code with useless variables called `resource`. But the specific example here under consideration is useless, and I was reasoning about that case. – curiousguy Jun 04 '18 at 17:48
@curiousguy The purpose of `resource` was to convey a concept to me, and in that sense it was successful. If it bothers you enough, feel free to amend Chris Beck's code in your mind to include a `return resource` line. – Cort Ammon Jun 04 '18 at 19:07
@CortAmmon How a variable is used changes which transformations can take place. *I can't randomly alter example code, as it would change what can be said about such code.* – curiousguy Jun 04 '18 at 21:55
@CortAmmon "_As written, anything with a memory address needs to be synchronized if it can be accessed by more than one thread._" @ChrisBeck 's point is that there is no way another function, be it a pthread library function, or any user code, could read that variable, and the compiler can see that. The reordering is legal in the particular case, assuming (to make the example interesting) that the value of `resource` is returned at the end (so the variable isn't useless). – curiousguy Jun 04 '18 at 23:34
@curiousguy It's very easy to see that variable if you have access to a pointer to the beginning of the stack, which easy enough to do before main starts. The only way around that would be to not give the variable a memory address (which would be possible if `resource` could be stored in a register). If it were stored in a register, I agree that pthreads no longer has anything to say about its value. – Cort Ammon Jun 05 '18 at 00:44
@CortAmmon With `ptrace` and knowledge of the layout of the stack and the object, and with debugging information, any local variable of a paused thread can be examined, even those that reside in registers. – curiousguy Jun 05 '18 at 01:41
@curiousguy True. Some of those local variables have a memory address, and some do not. – Cort Ammon Jun 05 '18 at 03:31
@CortAmmon Using ptrace (or other stack reading tool) is not examine the local variables is not sanctioned by the standard. It doesn't count as visible behavior. The compiler can change what you see with ptrace. – curiousguy Jun 05 '18 at 04:01
@curiousguy Exactly. That's why I didn't focus on what could be seen via ptrace, but instead focused on that which has a memory address, because *that* is a wording that matters in the specifications. You brought up ptrace. – Cort Ammon Jun 05 '18 at 05:23
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/172466/discussion-between-curiousguy-and-cort-ammon). – curiousguy Jun 05 '18 at 05:25
1

@ChrisBeck Curiousguy and I had a long conversation in chat, and I am no longer convinced that the reordering of `resource` is impossible if `resource` is on the stack. I'd always believed it was reorderable if it was stored in a register (which has no memory address), but some of the examples curiousguy put forth make me question things even when we can prove `resource` is on the stack. – Cort Ammon Jun 07 '18 at 00:06

Arunmu · Answer 3 · 2016-06-08T10:13:57.243

NOTE: I am no expert in this area and my knowledge about it is in a spaghetti like condition. So take the answer with a grain of salt.

NOTE-2: This might not be the answer that OP is expecting. But here are my 2 cents anyways if it helps:

My question is that I take a look at GCC5.1 code base and don't see anything special in std::mutex::lock/unlock to prevent compiler reordering codes.

g++ using pthread library. std::mutex is just a thin wrapper around pthread_mutex. So, you will have to actually go and have a look at pthread's mutex implementation.
If you go bit deeper into the pthread implementation (which you can find here), you will see that it uses atomic instructions along with futex calls.

Two minor things to remember here:
1. The atomic instructions do use barriers.
2. Any function call is equivalent to full barrier. Do not remember from where I read it.
3. mutex calls may put the thread to sleep and cause context switch.

Now, as far as reordering goes, one of the things that needs to be guaranteed is that, no instruction after lock and before unlock should be reordered to before lock or after unlock. This I believe is not a full-barrier, but rather just acquire and release barrier respectively. But, this is again platform dependent, x86 provides sequential consistency by default whereas ARM provides a weaker ordering guarantee.

I strongly recommend this blog series: http://preshing.com/archives/ It explains lots of lower level stuff in easy to understand language. Guess, I have to read it once again :)

UPDATE:: Unable to comment on @Cort Ammons answer due to length

@Kane I am not sure about this, but people in general write barriers for processor level which takes care of compiler level barriers as well. The same is not true for compiler builtin barriers.

Now, since the pthread_*lock* functions definitions are not present in the translation unit where you are making use of it (this is doubtful), calling lock - unlock should provide you with full memory barrier. The pthread implementation for the platform makes use of atomic instructions to block any other thread from accessing the memory locations after the lock or before unlock. Now since only one thread is executing the critical portion of the code it is ensured that any reordering within that will not change the expected behaviour as mentioned in above comment.

Atomics is pretty tough to understand and to get right, so, what I have written above is from my understanding. Would be very glad to know if my understanding is wrong here.

"Any function call is equivalent to full barrier. Do not remember from where I read it." that is not necessarily true. Inlined functions may be reorganised with respect to surrunding code as long the code behaves according to "as-if" rule - that is true both for functions mark as inline as functions inlined by compiler and/or linker (LTO) heuristic. Even when the function is not inlined this is effectivly a compiler barrier not processor barrier so there might be still some reordering even on processors with strong memory model such as x86. — Maciej Piechotka, Jun 07 '16 at 18:43
@MaciejPiechotka Yup, not true for inlined functions. And yeah, I think it is effectively a compiler barrier not one enforced by processor. — Arunmu, Jun 07 '16 at 18:48
"_Any function call is equivalent to full barrier. Do not remember from where I read it._" Any call to a separately compiler function is by its very nature a barrier against compiler reordering, at least of any access to a shared object. That's really the immediate application of the definition of a function call. — curiousguy, Jan 20 '20 at 22:55
Interesting to me since I provide OS independent wrappers around pthread locking. Thus identifying pthread locking by name would be suicide. The inline functions are interesting, I need to do an analysis on that. — Scott Franco, Aug 26 '22 at 04:16

score 0 · Answer 4 · answered Jun 02 '18 at 17:23

So acquire/release should disable both compiler and CPU reorder instructions.

By definition anything that prevents CPU reordering by speculative execution prevents compiler reordering. That's the definition of language semantics, even without MT (multi-threading) in the language, so you will be safe from reordering on old compilers that don't support MT.

But these compilers aren't safe for MT for a bunch of reasons, from the lack of thread protection around runtime initialization of static variables to the implicitly modified global variables like errno, etc.

Also, in C/C++, any call to a function that is purely external (that is: not inline, available for inlining at any point), without annotation explaining what it does (like the "pure function" attribute of some popular compiler), must be assumed to do anything that legal C/C++ code can do. No non trivial reordering would be possible (any reordering that is visible is non trivial).

Any correct implementation of locks on systems with multiple units of execution that don't simulate a global order on assembly instructions will require memory barriers and will prevent reordering.

An implementation of locks on a linearly executing CPU, with only one unit of execution (or where all threads are bound on the same unit of execution), might use only volatile variables for synchronisation and that is unsafe as volatile reads resp. writes do not provide any guarantee of acquire resp. release of any other data (contrast Java). Some kind of compiler barrier would be needed, like a strongly external function call, or some asm (""/*nothing*/) (which is compiler specific and even compiler version specific).

"`asm (""/*nothing*/)`" To my great surprise, I read that there is in fact no guarantee that old style `asm("some asm");` have any "clobber" in GCC. Which makes no sense to me. So assume an explicit `"memory"` clobber here. — curiousguy, Jan 20 '20 at 22:53
And now we are getting close to why giving threading primitives over to the language is a good idea. If the compiler implements locking, it knows the where and why and won't reorder locks. — Scott Franco, Aug 26 '22 at 04:19

How compiler like GCC implement acquire/release semantics for std::mutex

4 Answers4

Linked