11

Writing an answer for another question some interesting things came out and now I can't understand how Interlocked.Increment(ref long value) works on 32 bit systems. Let me explain.

Native InterlockedIncrement64 is now not available when compiling for 32 bit environment, OK, it makes sense because in .NET you can't align memory as required and it may be called from managed then they dropped it.

In .NET we can call Interlocked.Increment() with a reference to a 64 bit variable, we still don't have any constraint about its alignment (for example in a structure, also where we may use FieldOffset and StructLayout) but documentation doesn't mention any limitation (AFAIK). It's magic, it works!

Hans Passant noted that Interlocked.Increment() is a special method recognized by JIT compiler and it will emit a call to COMInterlocked::ExchangeAdd64() which will then call FastInterlockExchangeAddLong which is a macro for InterlockedExchangeAdd64 which shares same limitations of InterlockedIncrement64.

Now I'm perplex.

Forget for one second managed environment and go back to native. Why InterlockedIncrement64 can't work but InterlockedExchangeAdd64 does? InterlockedIncrement64 is a macro, if intrinsics aren't available and InterlockedExchangeAdd64 works then it may be implemented as a call to InterlockedExchangeAdd64...

Let's go back to managed: how an atomic 64 bit increment is implemented on 32 bit systems? I suppose sentence "This function is atomic with respect to calls to other interlocked functions" is important but still I didn't see any code (thanks Hans to point out to deeper implementation) to do it. Let's pick InterlockedExchangedAdd64 implementation from WinBase.h when intrinsics aren't available:

FORCEINLINE
LONGLONG
InterlockedExchangeAdd64(
    _Inout_ LONGLONG volatile *Addend,
    _In_    LONGLONG Value
    )
{
    LONGLONG Old;

    do {
        Old = *Addend;
    } while (InterlockedCompareExchange64(Addend,
                                          Old + Value,
                                          Old) != Old);

    return Old;
}

How can it be atomic for reading/writing?

Community
  • 1
  • 1
Adriano Repetti
  • 65,416
  • 20
  • 137
  • 208
  • Who said that "`InterlockedIncrement64` can't work but `InterlockedExchangeAdd64` does"? Your original answer was correct in saying that managed code cannot directly call the native Win32 APIs and expect everything to work. Neither of them are going to work. You have to use the managed helper. Now, the implementation of the managed helper is native code, so it calls the native function. Since the macros and intrinsics are resolved at compile-time, what counts is the bitness of the CLR. – Cody Gray - on strike Jun 01 '16 at 08:04
  • Yes but 32 bit JIT will call InterlockedExchangeAdd64 which has same limitations (in native) as InterlockedIncrement64. What I didn't understand is how it can be done (because of memory alignment when called for managed code). Implementation on 32 bit uses InterlockedCompareExchange64 which...hmmm....may be not atomic (for writing result back...) – Adriano Repetti Jun 01 '16 at 08:14
  • 1
    *"How can it be atomic for reading/writing?"* The documentation for `InterlockedExchangeAdd64` hints at the reason, saying *"This function generates a full memory barrier (or fence) to ensure that memory operations are completed in order."* Notice that the implementation you show above calls `InterlockedCompareExchange64`. On 32-bit builds, this emits a `CMPXCHG8B` instruction with a `LOCK` prefix. This ensures that the instruction is executed atomically. You never get a locked read without a locked write, so writing the destination is atomic. – Cody Gray - on strike Jun 01 '16 at 08:24

1 Answers1

9

You have to keep following the trail, InterlockedExchangeAdd64() takes you to the WinNt.h SDK header file. Where you'll see many versions of it, depending on the target architecture.

This generally collapses to:

#define InterlockedExchangeAdd64 _InterlockedExchangeAdd64

Which passes the buck to a compiler intrinsic, declared in vc/include/intrin.h and implemented by the compiler's back-end.

Or in other words, different builds of the CLR will have different implementations of it. There have been many over the years, x86, x64, Itanium, ARM, ARM8, PowerPC off the top of my head, I'm surely missing some that used to boot WindowsCE before Apple made it irrelevant. For x86 this ultimately is taken care of by LOCK CMPXCHNG8B, a dedicated processor instruction that can handle misaligned 64-bit variables. I don't have the hardware to see what it looks like on other 32-bit processors.

Do keep in mind that the target architecture for managed code is not nailed down at compile time. It is the jitter that adapts the MSIL to the target at runtime. That isn't quite so relevant for C++/CLI projects since you generally do have to pick a target if you compile with /clr instead of /clr:pure and only x86 and x64 can work. But the plumbing is in place anyway so a macro just isn't very useful.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • Sorry, I don't get it. Assuming implementation of InterlockedExchangeAdd64 works well on 32 bit (without instrinsic!) then also InterlockedIncrement64 macro may be implemented in the same way (on 32 bit managed environment). Extending this then also native functions on 64 bit may be implemented reliably on 32 bit. How?! Well, I guess that wording _"This function is atomic **with respect to calls to other interlocked functions**."_ is the key but following the path I can't see any _special_ code to do it – Adriano Repetti Jun 01 '16 at 07:54
  • Not so sure where the hangup lies, macros only work at compile time. Which is fine when the code is inside the CLR, you'll have an appropriate version of it at runtime that matches the target architecture. So a call into the helper function always gets the job done. It is not fine at compile time in your own .NET assembly since it can run on different platforms. Still getting it converted to inline assembly code at runtime (LOCK XADD) when the target platform permits it is of course the sweet spot. It is the processor that provides the guarantee in either case. – Hans Passant Jun 01 '16 at 07:59
  • Yes, of course I understand macros has to be resolved at compile time. Suppose you have a mixed C++/CLI project targeting Win32. You can call Interlocked::Increment(long long). At run-time JIT will replace that call with its own helper function. Now: 1) if that helper function exists then also InterlockedIncrement64 may be implemented in the same way (instead of dropped). 2) How can it be atomic if a) you're on 32 bit system and b) memory does not need to be aligned? – Adriano Repetti Jun 01 '16 at 08:03
  • @AdrianoRepetti - one simple way it *could* be done (not saying it is). *You* don't get to control memory layout of managed objects, the runtime does. The runtime also needs to guarantee that it can do a 64-bit interlocked operation. Hmm. The same "piece" of code seems to be involved here. So it *could* decide to always ensure that 64-bit ints *are* 64-bit aligned, *if* that's a requirement on the architecture for which that runtime is targetted. – Damien_The_Unbeliever Jun 01 '16 at 08:03
  • @Damien_The_Unbeliever it makes sense, writing back value won't be atomic on 32 bit systems but hmmmmm – Adriano Repetti Jun 01 '16 at 08:15
  • 1
    Just try it with a sample C++ project that targets x86 and single-step through the machine code. You'll discover LOCK CMPXCHNG8B, a dedicated processor instruction that can handle misaligned 64-bit variables. Also shows up in the Windows minimum requirements, there were early AMD processors that didn't have this instruction yet. – Hans Passant Jun 01 '16 at 08:21
  • @HansPassant thank you, that's it! Please include it also in your answers, it addresses further doubts I aready added to my question. – Adriano Repetti Jun 01 '16 at 08:23