2

I looked at the Interlocked.Or method in .NET 5. This is great for two integer values. Is there a way to perform the equivalent of this on two separate byte values?

I searched the documentation, and I can see that InterlockedOr8 exists in winnt.h, but P/Invoking it would not produce good performance characteristics.

I have tried calling Interlocked.CompareExchange<T> as follows, just to see if I could simply call it in general with some byte values:

var map = new byte[268435456];
Interlocked.CompareExchange(ref map[0], (byte)137, (byte)137);

But I get the following error:

Error CS0452 The type 'byte' must be a reference type in order to use it as parameter 'T' in the generic type or method 'Interlocked.CompareExchange(ref T, T, T)'

Alexandru
  • 12,264
  • 17
  • 113
  • 208
  • 1
    Here is the source for Interlocked.Or: https://source.dot.net/#System.Private.CoreLib/Interlocked.cs,ca281796ddec000e,references (or https://github.com/dotnet/runtime/blob/f1ad1b97e9387dc700bf72917f29fc7d1258ac1a/src/libraries/System.Private.CoreLib/src/System/Threading/Interlocked.cs#L201) – Chris Yungmann Feb 01 '21 at 04:46
  • 2
    I'll add `Interlocked.CompareExchange` basically compiles down to a CPU instruction. It's typically known as a [CAS operation](https://en.wikipedia.org/wiki/Compare-and-swap) (compare and swap), not sure why C# named it different. It's been handled by hardware for a very long time, predating x86. – Zer0 Feb 01 '21 at 04:50
  • 1
    _"I can see that InterlockedOr8 exists in winnt.h. Could there be a way to P/Invoke it from C#?"_ -- sure, but probably not while preserving the performance aspects you're looking for. – Peter Duniho Feb 01 '21 at 04:54
  • @PeterDuniho Indeed, P/Invoke would probably chew the CPU. – Alexandru Feb 01 '21 at 04:58
  • Right. What I mean is that, given that the native function uses a compiler intrinsic to directly generate the CPU instruction needed, and given that p/invoke necessarily involves actual _function calls_, the latter even if it could be made to compile, is almost certainly going to involve a completely different sequence of executed instructions: just one for the native function in C/C++, and dozens if not hundreds or thousands for p/invoke. – Peter Duniho Feb 01 '21 at 05:00
  • 1
    I do recommend that you narrow the focus of your question. There are at least three different questions here. Decide which question you really need the answer to, and then edit the question so it contains only that, along with detailed information about what you've tried so far to solve that one question. – Peter Duniho Feb 01 '21 at 05:01
  • @PeterDuniho Okay, I modified the question. – Alexandru Feb 01 '21 at 05:12
  • Is there `Interlocked.CompareExchange` for `byte`? No. Not if you expect it to behave as the others (a CPU instruction). Could you implement other valid CAS calls to make a `byte` assignment atomic? Sure. This [answer](https://stackoverflow.com/questions/6690386/using-interlocked-compareexchange-with-a-class) might help. Don't use unsafe casts here either. Trying to do an unsafe CAS with a byte can destroy the rest of the word (32 bit or 64 bit currently) in memory. – Zer0 Feb 01 '21 at 05:34
  • 1
    One workaround is store a byte in a supported CAS type and ignoring the rest as garbage. Problem is that doesn't work with byte arrays, making it pretty useless. And a waste of memory. A single byte? Sure, stuff it into something supported. – Zer0 Feb 01 '21 at 05:55
  • @Zer0 In my case I can't. Stuffing it would create a new type, which would defeat the purpose of using the byte array's reference value which I have in the question to perform quick multithreaded operations where there's a lot of thread contention on the values of that array. – Alexandru Feb 01 '21 at 06:02
  • @ChrisYungmann @Zer0 You guys wouldn't happen to know where the source code contains the implementation for that `CompareExchange` function, would you? – Alexandru Feb 01 '21 at 06:13
  • 1
    I assume this interlocked operation is part of a larger algorithm of some kind, e.g. you are iterating over a byte array and inspecting/changing them one by one. Instead of calling `InterlockedOr8` over and over directly from c#, consider rewriting the *entire algorithm* in c++, then calling that only *once* from c#. Then you only pay the cost of a single thunk. [Article](https://learn.microsoft.com/en-us/cpp/dotnet/performance-considerations-for-interop-cpp?view=msvc-160). – John Wu Feb 01 '21 at 06:31
  • I don't, but don't think source code even matters. Exact instruction depends on CPU architecture really, but `CMPXCHG` and similar are common. Not much C# can do here since it's really just exposing hardware capabilities for you to use. And it maps to a single instruction. – Zer0 Feb 01 '21 at 07:54
  • 1
    Could you explain a bit of your algorithm? If you have lots of thread contention the performance will probably not be great, regardless of what synchronization you use. In some cases it is possible to let threads work on separate arrays, and merge the result when done. – JonasH Feb 01 '21 at 07:55
  • I may be able to write a lock-free solution for what you wanted if I understood the functionality very clearly. I can already hear the "premature optmization" and "just use lock" arguments, but it seems you're avoiding `Monitor` and other sync primitives for a reason. – Zer0 Feb 01 '21 at 08:08
  • @JohnWu You gave me an idea to start with but now I'm a bit blocked: https://stackoverflow.com/questions/65998398/passing-bytes-by-reference-from-c-sharp-into-c-cli-wrapper-to-call-the-interlo – Alexandru Feb 01 '21 at 18:51

0 Answers0