What does the standard guarantee about data races?

Question

This is a continuation of the discussion on multithreading issues in C#.

In C++, unprotected access to the shared data from multiple threads is an undefined behavior* if there is a write operation involved. What is it in C#? As (the safe part of) C# doesn't contain undefined behaviors, are there any guarantees? C# seems to have a kind of as-if rule as well, but after reading the mentioned part of the standard I fail to see what are the consequences of an unprotected data access from the language point of view.

In particular, it's interesting to know which kind of optimizations including load fusing and invention are prohibited through the language. This prohibition would imply the validity (or the lack thereof) of several popular patterns in C# (including the one discussed in the original question).

[The details of the actual implementation in Microsoft CLR, despite being very interesting, are not the part of this question: only the guarantees given by the language itself (and therefore portable) are here under discussion.]

The normative references are very welcome but I suspect the C# standard has enough information on the topic. Maybe someone from the language team can shed some light on what are the actual guarantees which are going to be included into the standard later but can be relied upon right now.

I suspect that there are some implied guarantees like the absence of ~~pointer~~ reference tearing because this could easily lead to breaking the type safety. But I'm not an expert on the topic.

*Often shortened as UB. Undefined Behavior allows a C++ compiler to produce literally any code, including formatting the hard disk or whatever, or to crash at compile time.

If you don't get a good answer here, you might have better luck in the `#allow-unsafe-blocks` channel on the [C# discord](https://aka.ms/csharp-discord) -- a bunch of compiler/runtime people hang out there — canton7, Jun 07 '22 at 08:20
@Vlad why are you asking, and what kind of behavior would you expect in a many-core machine where access to RAM isn't uniform? Memory access is controlled by the runtime, not C# and the runtime is generally more lax than x86 CPUs, to avoid introducing avoidable blocking. Even laptops can have 12 cores nowadays. A 12-way barrier would be ... undesirable. — Panagiotis Kanavos, Jun 07 '22 at 10:06
@PanagiotisKanavos: One does seldom write multithreaded programs for specific hardware only. The guarantees and requirements for multithreaded access must be clearly specified by the language itself because (1) one must be able to write a portable program, and (2) there is no guarantee to how any given language construct will be represented in the JITted code (there is even no guarantee that JIT is involved at all). Other languages like e. g. C++ or Java define the rules and guarantees like the ones I'm asking for quite in detail. — Vlad, Jun 07 '22 at 10:10
To write a portable program *don't* try to write to memory from multiple threads. Otherwise you *have* to write hardware- or OS- specific code because you depend on it. From the other question I see you already found the articles on the C# memory model which already explain what I wrote. There's a [Github issue](https://github.com/dotnet/runtime/issues/63474) to write an official memory model document, but it's still open. — Panagiotis Kanavos, Jun 07 '22 at 10:15
@PanagiotisKanavos: Well, the articles I found are mostly describing CLR's implementation details and not the guarantees that the language is giving, so they are not extremely helpful. Thank you for the github link, the discussion even lists my original question (https://github.com/dotnet/runtime/issues/63474#issuecomment-1014183163) about events. Anyway I'd like to get an official (TM) statement of the language team or a reference to it. — Vlad, Jun 07 '22 at 10:23
Note, this is probably a runtime question, not a language question. C#'s memory model will be "whatever the runtime provides" — canton7, Jun 07 '22 at 10:24
@canton7: I strongly disagree, because the correctness of a multithreaded program IMO must be guaranteed by the language itself and not by the underlying runtime. To defend my point, both C++ and Java give the guarantees (or lack thereof) on the language standard side, not on the underlying runtime (which is with all due respect just an implementation detail, which a portable program has to ignore). — Vlad, Jun 07 '22 at 10:26
Yes, but I think the C# spec (ECMA-334) will just reference the CLR spec (ECMA-335). Note that the CLR spec is not a Microsoft thing, .Net is a *specific* implementation of it — Charlieface, Jun 07 '22 at 11:24
@Charlieface: I assume that there are multiple runtimes, including Microsoft CLR (Framework/Core), Mono runtime and even browser-based runtimes for WASM target. Each of the runtimes can run on a plenty of machine architectures, which certainly adds even more complexity. — Vlad, Jun 07 '22 at 11:28
Yes, but they all conform to ECMA-335, as mentioned. You can look at that document for the guarantees provided (very little outside of pointer-sized objects and interlocked operations. Tearing is fair game). Do not rely on specific *implementations* or processor architectures, do rely on the runtime *specification*. — Charlieface, Jun 07 '22 at 11:31
C# is not like C or C++. Its standard, while it does try well enough to describe the language, is not as well-defined and implementation-independent as those languages. A lot of finer detail is left implicit, as a kind of "nudge nudge, wink wink, we all know what the runtime will do there" thing. The expectation that the language standard is absolutely normative and will restrict the runtime environment except only in those cases where things are left open, as with C or C++, is, for better or worse, not warranted. — Jeroen Mostert, Jun 07 '22 at 14:12
@Charlieface: Well, there is another problem with just referring to the CLR: there is no guarantee that variable assignment in C# would compile down to write into the memory by the compiler. Extreme example: https://godbolt.org/z/GqYzqjYvq (C++ optimizer optimized out variables and loop altogether, folding the computation into a [closed formula](https://en.wikipedia.org/wiki/Triangular_number)). This means that the variable in the source code doesn't have to correspond to a memory location in the compiled code (at least in C++). — Vlad, Jun 07 '22 at 14:45
That is true, you need ECMA-334 to confirm that, but in the main the C# compiler will not do that (it relies on the JIT compiler to do that). Reordering and multi-threading guarantees are two of the reasons why. — Charlieface, Jun 07 '22 at 19:51

JonasH · Answer 1 · 2022-06-07T14:04:59.363

2

the .net runtime guarantees that writes to some variable types are atomic

Reads and writes of the following data types shall be atomic: bool, char, byte, sbyte, short, ushort, uint, int, float, and reference types. In addition, reads and writes of enum types with an underlying type in the previous list shall also be atomic. Reads and writes of other types, including long, ulong, double, and decimal, as well as user-defined types, need not be atomic. Aside from the library functions designed for that purpose, there is no guarantee of atomic read-modify-write, such as in the case of increment or decrement.

Not mentioned is IntPtr, that I believe is also guaranteed to be atomic. Since references are atomic they are guaranteed not to tear. See also C# - The C# Memory Model in Theory and Practice for more information

There should also be a guarantee of memory safety, i.e. that any memory access will reference valid memory and that all memory is initialized before usage. With some exceptions for things like unmanaged resources, unsafe code and stackalloc.

The general rule with regards to optimization is that the compiler/jitter may perform any optimization as long as the result would be identical for a single threaded program. So tearing, fusing, reordering, etc would all be possible, absent any synchronization.

So always use appropriate synchronization whenever there is a possibility that multiple threads use the same memory concurrently for anything except reading. Note that ARM has weaker memory ordering guarantees than x86/x64, further emphasizing the need for synchronization.

edited Jun 07 '22 at 14:04

answered Jun 07 '22 at 13:58

JonasH

28,608
2
10
23

So you basically mean that the pattern discussed in the [predecessor question](https://stackoverflow.com/q/37173038/276994) (that is, `EventHandler localCopy = SomeEvent; if (localCopy != null) localCopy(this, args);`) is not actually thread-safe? – Vlad Jun 07 '22 at 14:34
Well, if really _any_ transformation valid from single-threaded POV is allowed, so introducing a temporary modification of a global object (which is otherwise untouched by the current thread) must be allowed as well, right? – Vlad Jun 07 '22 at 14:36
So you basically say that there is no guarantee at all except using explicit synchronization. What I would really appreciate is a kind of reference to official (or at least semiofficial) documents which prove this. I need this in order to argue against the assumption that e. g. optimizer is not allowed to introduce reads. – Vlad Jun 07 '22 at 14:40
[Multicast delegates are immutable](https://learn.microsoft.com/en-us/dotnet/api/system.delegate?redirectedfrom=MSDN&view=net-6.0), i.e. a read-modify-write. so it depends what you mean by thread-safe. Absent any synchronization I would expect your example code to invoke *some* version of the delegate. But could absolutely have a data race, and there is a possibility of concurrent modifications the resulting delegate may contain only some of the modifications. – JonasH Jun 07 '22 at 14:48
Well, my original question about the events/delegates was whether the code in question may throw a `NullReferenceException` due to the load introduction (transformation into `if (SomeEvent != null) SomeEvent(this, args);` eliding the local variable) and modification by other thread after the test against `null`. – Vlad Jun 07 '22 at 14:50
@Vlad I'm honestly not sure. The as-if rule suggest that it could throw a nullref exception. But I'm having a real difficult imaging any circumstance where it would be useful to re-read a variable from memory unless the code actually calls for it. I would be fairly confident that such (de)optimization would not be done in practice. – JonasH Jun 07 '22 at 15:36
Well, the transformations like we are discussing were observed in the wild, as it was mentioned in the discussions to the predecessor question. In this case we are trading one read from the heap, one write to the stack and two reads from the stack against two reads from the heap (which might be a good deal, especially if the memory address is cached). [Storing the value to the register might be not an option if registers are used up by other variables.] – Vlad Jun 07 '22 at 15:44
@Vlad The particular version you mention is *not* guaranteed thread-safe by the runtime or compiler, and I don't think anyone could ever suggest it would be, without knowing the specific JIT compiler and CPU architechture. Instead use `SomeEvent?.Invoke` or a local variable, which is guaranteed thread-safe. – Charlieface Jun 07 '22 at 19:53
"the .net runtime guarantees" but you reference the C# spec. ECMA-335 Section 12.6.6 might be a better place to reference – Charlieface Jun 07 '22 at 19:56
@Charlieface: The version with the local variable was advertized as thread-safe (otherwise it would make no sense to introduce the local variable, right?) – Vlad Jun 10 '22 at 09:33
In other languages the multithreading guarantees are part of the language standard, which allows writing portable programs. That's why I'm always referring to the C# standard and not to the runtime (which must be considered replaceable). – Vlad Jun 10 '22 at 09:36
Why *must* the runtime *standard* be considered replaceable? C# is not designed for example to run on the Java VM. It is designed to run on a ECMA-335 compatible runtime. Yes using a local variable is thread-safe. If it wasn't then nothing would ever be thread-safe. Possibly you may need a memory barrier when *overwriting* with a new subscriber. – Charlieface Jun 10 '22 at 16:27

score 2 · Answer 2 · answered Jun 07 '22 at 20:13

As mentioned by @JonasH, the C# spec only guarantees atomic access to values sized 32 bits or smaller.

But, assuming you can rely on C# always being implemented on a runtime conforming to ECMA-335, then you can rely on that spec also. This should be safe, as all implementations of .Net, including Mono and WASM, conform to ECMA-335 (it is not a Microsoft-only spec).

ECMA-335 guarantees access to native-sized values, which includes IntPtr and object references, as well as 64-bit integers on a 64-bit architecture.

ECMA-335 says: (my bold)

12.6.6 Atomic reads and writes

A conforming CLI shall guarantee that read and write access to properly aligned memory locations no larger than the native word size (the size of type native int) is atomic (see §12.6.2) when all the write accesses to a location are the same size. Atomic writes shall alter no bits other than those written. Unless explicit layout control (see Partition II (Controlling Instance Layout)) is used to alter the default behavior, data elements no larger than the natural word size (the size of a native int) shall be properly aligned. Object references shall be treated as though they are stored in the native word size.

[Note: There is no guarantee about atomic update (read-modify-write) of memory, except for methods provided for that purpose as part of the class library (see Partition IV). An atomic write of a "small data item" (an item no larger than the native word size) is required to do an atomic read/modify/write on hardware that does not support direct writes to small data items. end note]

You seem to be asking specifically about the atomicity of the code

if (SomeEvent != null) SomeEvent(this, args);

This code is not guaranteed to be thread-safe, either by the C# spec or by the .NET spec. While it is true that an optimizing JIT compiler might generate thread-safe code, it's unsafe to rely on it.

Instead use the better (and more concise) code, this is guaranteed thread-safe.

SomeEvent?.Invoke(this, args);

*"this is guaranteed thread-safe"* -- It's guaranteed that it will not throw a `NullReferenceException`. It's not guaranteed that an unsubscribed consumer will not be inadvertedly invoked. Take a look at [this](https://stackoverflow.com/questions/786383/c-sharp-events-and-thread-safety "C# Events and Thread Safety") question that provides a deeper insight about thread-safety and events. In short the C# events is a fundamentally flawed mechanism in a multithreaded environment. Avoiding race conditions is simply impossible. That's why thread-safe classes rarely expose events. — Theodor Zoulias, Jun 07 '22 at 21:04
Actually, the original question was not about the code `if (SomeEvent != null) SomeEvent(this, args);` (which is clearly not thread-safe, as `SomeEvent` may be modified after the null check), but rather about the code `var local = SomeEvent; if (local != null) local(this, args);`, which is a little more tricky. — Vlad, Jun 07 '22 at 22:00
Relying on ECMA would be complicated because it's not clear if a variable in C# is guaranteed to correspond to a memory location in CLR, value assignment in C# to correspond store into the said memory location in CLR and so on. The optimizing compiler could be free to transform all that into something completely different, just semantically equivalent. — Vlad, Jun 07 '22 at 22:05
@Vlad "semantically equivalent" includes guarantees about thread-safety (ie observer effect). I don't see what's tricky about `local`: it remains a separate location from `SomeEvent` and it would be against the spec in my opinion to do anything different. You can hoist it into a field, or store it on the Moon, but the effect to an observer must be the same. — Charlieface, Jun 07 '22 at 22:41
@TheodorZoulias *"It's not guaranteed that an unsubscribed consumer will not be inadvertedly invoked"* IMO that's not actually a problem. What you are discussing is a subscriber being retrieved, then being unsubscribed from another thread, the getting invoked. But even if C# events would be "thread-safe", an unsubscribe could still happen *during* the execution of one of the subscriber functions, or in that instantaneous point *after* invocation but *before* the rest of the function. So no subscriber that expects to be unsubscribed in this manner would be written to fail in such an event. — Charlieface, Jun 07 '22 at 22:48

What does the standard guarantee about data races?

2 Answers2