Interlocked & Thread-Safe operations

Question

1. Out of curiosity, what does operations like the following do behind the scenes when they get called for example from 2 or 3 threads at the same time?

Interlocked.Add(ref myInt, 24);

Interlocked.Increment(ref counter);

Does the C# creates an inside queue that tells for example Thread 2, now it's your turn to do the operation, then it tells Thread 1 now it's your turn, and then Thread 3 you do the operation? So that they will not interfere with each other?

2. Why doesn't the C# do this process automatically?

Isn't it obvious that when a programmer write something like the following inside a multi-thread method:

myByte++;

Sum = int1 + int2 + int3;

and this variables are Shared with other threads, that he wants each of this operations do be executed as an Atomic operation without interruptions?

Why does the programmer have to tell it Explicitly to do so?

Isn't it clear that that's what every programmer wants? Aren't this "Interlocked" methods just add unnecessary complication to the language?

Thanks.

The `Interlocked` class members have two guarantees. One is that the guarantee that the operation occurs atomically (on many processors, it is a single atomic operation). This will prevent two threads on the same processor from both accessing the memory at the same time). The other guarantee is that it puts appropriate _memory guards_ around the instructions that prevent any two threads from doing out of order memory accesses. Together they do what you want. — Flydog57, Jul 16 '22 at 03:27
@Flydog57 I'm not sure that you answered my question, or maybe I didn't understand you. I Know that that's what the Interlocked methods are doing, I asked How does it do it, and Why doesn't the C# do it automatically without me having to tell it to? — Robert, Jul 16 '22 at 03:34
Why not do it automatically? If that happened, your code would end up running too _politely_. Every memory access would wait for every other memory access, defeating the advantages of running on multiple threads. Memory guards get in the way — Flydog57, Jul 16 '22 at 03:34
@Flydog57 I still don't get it, can you give me an example of a scenario where the programmer write: myCounter++, and he doesn't care if another thread will interfere it in the middle and mess it up? If there isn't such a scenario, then why doesn't the C# do it automatically? — Robert, Jul 16 '22 at 04:10
Are you asking only about the operations supported by the `Interlocked` class, or about all possible operations on any type? For example would you like to know why `myDecimal += 12.3M` or `myString += "abc"` are not atomic in C# by themselves, and the programmer has to do something explicitly to prevent interruptions? — Theodor Zoulias, Jul 16 '22 at 05:09
@Robert. If it's a local variable or a instance variable of an unshaved object, there's no need to pay the price of Interlocked. You only need to protect shared data. The classic use of Interlocked increment and decrement are for COM-style reference counting (which is applicable in more than just COM-land) — Flydog57, Jul 16 '22 at 05:15
@Theodor Zoulias I'm asking also about the operations that you mentioned. Is there any scenario, where you as a programmer wants to write myString += "abc", and you don't care if in the middle another thread will execute the command myString += "123" at the same time? — Robert, Jul 16 '22 at 05:32
@Flydog57 So why can't C# detect Automatically shared variables, and protect them behind the scenes without me having to tell it to do so? — Robert, Jul 16 '22 at 05:34
How about operations on mutable types, like the `List` for example. Would you like to know also why the `list.Add(something)` is not thread-safe, and the programmer has to add explicitly thread-synchronization in order to prevent undefined behavior? — Theodor Zoulias, Jul 16 '22 at 06:22
@Theodor Zoulias Let's start with the operations that we mentioned previously, operations that it's clear (to me at least) that the C# compiler can detect by it self that may collide with other threads, and protect this operations automatically and make them Atomic, without the need of explicit request by the programmer. — Robert, Jul 16 '22 at 06:35
So you propose that the compiler could replace transparently the `myString += "abc"` with `lock (__hidden_locker) myString += "abc"`, but only in case the `myString` is shared, like being a field or property of a class or struct? And otherwise, if the `myString` is a local variable, to leave the `myString += "abc"` as is? — Theodor Zoulias, Jul 16 '22 at 06:48
@Theodor Zoulias Exactly, why not? What's the problem with that? — Robert, Jul 16 '22 at 06:51
@Robert - The compile can't know what items would be "shared" at compile-time. It would need to put guards around all code and performance would plummet. — Enigmativity, Jul 16 '22 at 07:07
Robert I am trying to understand your question. I am not going to answer it in the comments, because the comments are not intended for answering questions. Currently I am not sure about how exactly the C# could be, in order to be thread-safe. Do you visualize the C# as a completely thread-safe language, that a programmer can write any code, even intentionally unsafe, and the compiler could deal with everything that was thrown at it? Or just a slightly safer language, by adding bits of thread-safety here and there, by following a few simple rules? I can't answer your question without this info. — Theodor Zoulias, Jul 16 '22 at 07:10
@Enigmativity Why not? If I, as a programmer know exactly where I should put this protections, why can't the compiler? — Robert, Jul 16 '22 at 07:13
@Robert - You're the one writing the code. You know which code you're writing that only one thread will ever use. The compiler doesn't know that. It can't. — Enigmativity, Jul 16 '22 at 07:16
@Theodor Zoulias What I'm trying to understand is, why can't the C# compiler detect threads shared variables, just as I as a programmer do, and add the needed protections automatically without me having to tell it to do so? If I, as a programmer can detect very easily where should I put this Thread-Safe protections, why can't the compiler do that also? Why is it such a big problem? — Robert, Jul 16 '22 at 07:19
@Enigmativity Are you saying that the compiler can detect at compile-time variables that where not initialized and give me errors on that, but it can't detect a method that been called from more than one thread? Why? — Robert, Jul 16 '22 at 07:24
Imagine you compile a class library with a public class and public method. The compiler has no way to know whether this will be called only from one thread or from multiple threads — Klaus Gütter, Jul 16 '22 at 07:44
@Klaus Gütter Can't it look at the compiled code, and see if there are lines of code that call this public method? And see if this calls are been made from different threads or not? How can I, as a programmer look at the code and detect right away this problematic point, but the C# can't? Why can I look at Someone's Else code, that I did not write, and detect right away if the programmer forgot to put protections? Why can't the compiler do the same? — Robert, Jul 16 '22 at 07:54
If it is a class library you will not see the calling code ( which might not even exist yet) — Klaus Gütter, Jul 16 '22 at 07:57
@Klaus Gütter Where are the calls to this function? Where and when they are created? If I as a human programmer, look at your code, can't I tell you if you forgot to put a protections for this function's variables? Can't I detect this points by looking at your code? — Robert, Jul 16 '22 at 08:06
The code calling the method can be in an app consuming the class library. Static code analysis of the app code might reveal the problem but it cannot fix it since this would require modifying the compiled class library. — Klaus Gütter, Jul 16 '22 at 08:20
@Klaus Gütter But the C# compiler see both, the code of the app that consume the class library, and the code inside the class library. So why can't it detect unsafe method calls and add a protections while compiling? If I'll try to call a function in this class library, with 3 parameters, when this function has only 2 parameters, the compiler will right away give me an error and will tell me that there isn't such method with 3 parameters, so why can't it also detect 2 calls for this method from 2 different threads in your code? Why I as a human can do that? Am I smarted than the compiler? — Robert, Jul 16 '22 at 08:35
Note that the class libarary may have been compiled by someone else on a different machine a long time ago. And you just conume it e.g. by referencing a nuget package. As I wrote: Static code analysis might reveal the problem but it cannot fix it since this would require modifying the compiled class library. — Klaus Gütter, Jul 16 '22 at 08:45
@Klaus Gütter I understand that the class library may be a "Closed Box" to me, that I can't change, but when the compiler reads the class library code, and see (while compiling) that a method in this class library been called from 2 different threads, why can't it add a protection for shared variables inside this method, in the IL (Intermediate Language) code? Why should the code inside this class library change in order to do that? — Robert, Jul 16 '22 at 09:00
@Robert - No, the compiler doesn't do any of that. Optimization of single-threaded code trumps multithreaded code. It's as simple as that. — Enigmativity, Jul 16 '22 at 13:11
@Enigmativity I Totally Understand that the compiler doesn’t do that, but I still didn’t get an answer for WHY it can’t do that? Just as You can look at my code and tell me right away where I forgot to put lock protections, WHY can’t the compiler do exactly the same? It’s a very simple question. — Robert, Jul 16 '22 at 16:35
Atomic RMW (like x86 `lock add [rdi], eax`) is *much* slower than a non-atomic operation that lets the compiler optimize variables into registers. An atomic RMW on x86 is a full memory barrier, so making every operation atomic would destroy memory-level parallelism. As far as how it's implemented internally, CPU hardware arbitrates which core gets ownership of the cache line when there's contention. See [Can num++ be atomic for 'int num'?](https://stackoverflow.com/q/39393850) for a C++ and x86 asm / cpu-architecture explanation of the details. — Peter Cordes, Jul 16 '22 at 21:16
Re: why CPUs and compilers want to load early and store late (which atomic RMW would prevent): [Java instruction reordering and CPU memory reordering](https://stackoverflow.com/a/69569374) — Peter Cordes, Jul 16 '22 at 21:18
@Robert - My explanation was "Optimization of single-threaded code trumps multithreaded code.". I think it is as simple as that. — Enigmativity, Jul 17 '22 at 00:17
There is *no way* C# could do this automatically as you suggest. C# cannot change a library compiled by others in order to insert interlocking, and it's ridiculous to suggest it should do so when contained in the same library and not when externally linked. Even the CLI Jitter could not do this. For example, what happens if you create a delegate to a function, then pop it into a queue. How would it detect that the lambda is being popped off and executed by a different thread? It just becomes so fiendishly complex to prove that it would mean that every possible path would require locking. — Charlieface, Jul 17 '22 at 04:04
Let's suppose you would mark a whole class with some kind of CLI attribute `[Interlocked]` where the Jitter needs to work out which access must be interlocked. It now needs to go through every access to this class, and JIT-recompile each method to do interlocked access. But for example: the class contains or accesses an object which references this class (for example an event handler for an object on a form). We now need to recompile *its* code. But that object often accesses objects which are not interlocked, so now we have two sets of code for the same class. What if it was accessing ..... — Charlieface, Jul 17 '22 at 04:13
..... different objects in a loop? It would have to dynamically decide which code to run *in the middle* of a block of code. The complications just get bigger and bigger and bigger. In the end you would just have to throw in the towel and interlock everything, which would slow down your app to a crawl. The one thing that *could* be enforced is that access to a particular field is only by `Interlocked`, you could use basic code analysis to enforce this, or just create functions for this. — Charlieface, Jul 17 '22 at 04:13

score 2 · Answer 1 · answered Jul 17 '22 at 20:17

what does operations like the following do behind the scenes

As far as how it's implemented internally, CPU hardware arbitrates which core gets ownership of the cache line when there's contention. See Can num++ be atomic for 'int num'? for a C++ and x86 asm / cpu-architecture explanation of the details.

Re: why CPUs and compilers want to load early and store late:
see Java instruction reordering and CPU memory reordering
Atomic RMW prevents that, so do seq_cst store semantics on most ISAs where you do a plain store and then a full barrier. (AArch64 has a special interaction between stlr and ldar to prevent StoreLoad reordering of seq_cst operations, but still allow reordering with other operations.)

Isn't it obvious that when a programmer write something like the following inside a multi-thread method [...]

What does that even mean? It's not running the same method in multiple threads that's a problem, it's accessing shared data. How is the compiler supposed to know which data will be accessed non-readonly from multiple threads at the same time, not inside a critical section?

There's no reasonable way to prove this in general, only in some simplistic cases. If a compiler were to try, it would have to be conservative, erring on the side of making more things atomic, at a huge cost in performance. The other kind of mistake would be a correctness problem, and if that could just happen when the compiler guesses wrong based on some undocumented heuristics, it would make the language unusable for multi-threaded programs.

Besides that, not all multi-threaded code needs sequential consistency all the time; often acquire/release or relaxed atomics are fine, but sometimes they aren't. It makes a lot of sense for programmers to be explicit about what ordering and atomicity their algorithm is built on.

Also you carefully design lock-free multi-threaded code to do things in a sensible order. In C++, you don't have to use Interlock..., but instead you make a variable std::atomic<int> shared_int; for example. (Or use std::atomic_ref<int> to do atomic operations on variables that other code can access non-atomically, like using Interlocked functions).

Having no explicit indication in the source of which operations are atomic with what ordering semantics would make it harder to read and maintain such code. Correct lock-free algorithms don't just happen by having the compiler turn individual operators into atomic ops.

Promoting every operation to atomic would destroy performance. Most data isn't shared, even in functions that access some shared data structures.

Atomic RMW (like x86 lock add [rdi], eax) is much slower than a non-atomic operation, especially since non-atomic lets the compiler optimize variables into registers.

An atomic RMW on x86 is a full memory barrier, so making every operation atomic would destroy memory-level parallelism every time you use a += or ++.

e.g. one per 18 cycle throughput on Skylake for lock xadd [mem], reg if hot in L1d cache, vs. one per 0.25 cycles for add reg, reg (https://uops.info), not to mention removing opportunities to optimize away and combine operations. And reducing the ability for out-of-order execution to overlap work.

Thanks for the response. But why can’t the run-time mechanism just prevent accessing a variable from more than one process/thread at the same time? If for example a process is asking to perform: myVar++, then block the access to this variable until the operation is finish. And if another process is asking to use the same variable in the middle, then show a run-time error that will tell the programmer which 2 threads tried to access this variable at the same time, so that he will be able to fix it? Why the run-time mechanism allow such cases at all? Why doesn’t it give an error? — Robert, Jul 18 '22 at 06:02
@Robert: Because CPUs don't have hardware race detection, which would require having 2 different kinds of loads/stores (and RMWs). CPU hardware can't tell the difference between code where you *meant* to do that, vs. code where it wasn't intended. In programming in general, ultimately the CPU is just running machine code which tells it what to do one step at a time. Even if you had race detection, using it would probably mean making asm that did all operations on memory (instead of keeping values in registers), so would defeat optimizations. — Peter Cordes, Jul 18 '22 at 12:37
Thanks very much, I think that I’m starting to understand the problems here. — Robert, Jul 18 '22 at 22:15

score 1 · Answer 2 · answered Jul 17 '22 at 02:13

1

This is a partial answer to you question you asked in the in the comments:

Why not? If I, as a programmer know exactly where I should put this protections, why can't the compiler?

In order for the compiler to do that, it would need to understand all possible execution paths through your program. This is effectively the Path Testing problem discussed here: https://softwareengineering.stackexchange.com/questions/277693/does-path-coverage-guarantee-finding-all-bugs

That article states that this is equivalent to the halting problem, which is computer-science-ese for saying it's an unsolvable problem.

The cool thing is that you want to do this in a world where you have multiple threads of execution running on possibly multiple processors. That makes an unsolvable problem that much harder to solve.

On the other hand, the programmer should know what his/her program does...

answered Jul 17 '22 at 02:13

Flydog57

6,851
2
17
18

Thanks, I looked at the link, but I don’t think that this is a case of a halting problem. The proof for it is that if You will look at my code, a code that you did not wrote yourself, you will be able very quickly to tell me where I forgot to put lock protections. – Robert Jul 17 '22 at 04:51
The compiler Doesn’t Have to cover all the possible combinations, of all the brunches, and all the possible values in each brunch and each statement. All it has to do, is check IF there are 2 statements that calls The Same Method from two different threads, and if it finds such case, then add Lock protection to all the shared variables inside this method. Just as You will do if you where looking at someone’s else code. – Robert Jul 17 '22 at 04:52
And here is a cute video that I saw years ago about the Halting problem: https://m.youtube.com/watch?v=92WHN-pAFCs – Robert Jul 17 '22 at 04:54
*"That makes an unsolvable problem that much harder to solve."* -- I would argue that making an unsolvable problem harder is a good thing, because it helps you reach faster to the conclusion that you can't solve it. – Theodor Zoulias Jul 17 '22 at 06:35
@Robert, you say _"All it has to do, is check IF there are 2 statements that calls The Same Method from two different threads"_. How would it find those two statements if it didn't look though all possible paths? – Flydog57 Jul 17 '22 at 19:06
@Flydog57 I’m starting to understand. But why can’t the run-time mechanism just prevent accessing a variable from more than one process/thread at the same time? If for example a process is asking to perform: myVar++, then block the access to this variable until the operation is finish. And if another process is asking to use the same variable in the middle, then show a run-time error that will tell the programmer which 2 threads tried to access this variable at the same time, so that he will be able to fix it? Why the run-time mechanism allow such cases at all? Why doesn’t it give an error? – Robert Jul 18 '22 at 06:01
Robert: now you are inverting the question, and making the runtime pay the price of figuring out what's running on thread A and thread B. That's another large expense, this time at runtime. One alternative would be the C++ `std::atomic` type that @PeterCordes discusses in his answer. C# could provide a way for a programmer to identify variables (/fields) that should be wrapped in interlocked access, say a `[InterlockedAlways]` attribute (by the way, `volatile` doesn't have these semantics). That's a possible alternative, but with the drawback that every access to a variable would cost – Flydog57 Jul 18 '22 at 16:05
@Flydog57 I didn’t ask that the runtime mechanism will try to identify which thread is trying to access a variable… What I suggested is that it will just not allow more than one process accessing a variable at the same time… and if such case happen then display a runtime error. But anyway, I’m starting to understand the problems here. – Robert Jul 18 '22 at 22:12

Interlocked & Thread-Safe operations

2 Answers2