26

What (if any) is the C# equivalent of the x86 asm xchg instruction?

With that command, which imo is a genuine exchange (unlike Interlocked.Exchange), I could simply atomically swap two ints, which is what I am really trying to do.

Update:

Sample code based upon my suggestion. Variable suffixed "_V" are decorated as volatile:

// PART 3 - process links
// prepare the new Producer
address.ProducerNew.WorkMask_V = 0;
// copy the current LinkMask
address.ProducerNew.LinkMask_V = address.Producer.LinkMask_V;
// has another (any) thread indicated it dropped its message link from this thread?
if (this.routerEmptyMask[address.ID] != 0)
{
  // allow all other bits to remain on (i.e. turn off now defunct links)
  address.ProducerNew.LinkMask_V &= ~this.routerEmptyMask[address.ID];
  // reset
  this.routerEmptyMask[address.ID] = 0;
}
// PART 4 - swap
address.ProducerNew = Interlocked.Exchange<IPC.Producer>(ref address.Producer, address.ProducerNew);
// PART 5 - lazily include the new links, make a working copy
workMask = address.Producer.LinkMask_V |= address.ProducerNew.WorkMask_V;

Note the lazy update.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
IamIC
  • 17,747
  • 20
  • 91
  • 154
  • Please update your question with the additional details that you have provided in the comments to my answer. It will help others trying to answer your question if they know all the constraints. http://stackoverflow.com/questions/3855671/how-do-i-atomically-swap-2-ints-in-c/3855700#3855700 – Jeff Yates Oct 04 '10 at 13:50
  • 11
    The XCHG instruction would not be the atomic operation you're looking for since there is no `XCHG [mem1],[mem2]` opcode. You would need three instructions: `mov reg,[mem1] XCHG reg,[mem2] mov [mem1],reg` – Skizz Oct 04 '10 at 15:35
  • @Skizz I had another look at the ASM manual - you're right. Wow. Even in assembler one can't do what I am looking for. – IamIC Oct 04 '10 at 16:28
  • In reality, it is impossible in any language, even ASM, to atomically swap two variables. One can only exchange a with b atomically, and non-atomically get back a as the result. – IamIC Nov 14 '10 at 15:45
  • @IanC I believe some early RISC processors allowed memory-to-memory atomic operations. More recently, IBM Cyclops 64. On a regular PC, one can achieve a similar effect by talking directly to the bus controller chip and having it initiate DMA for you. Nowadays, GPU computing is becoming fashionable and these also provide various forms of DMA. Although these methods are obviously impractical, I think "impossible" might be just a bit strong... – Glenn Slayden Jul 29 '12 at 22:26
  • 1
    @GlennSlayden: m68k has/had double-word compare-and-swap (of two non-contiguous words in memory against two registers). https://en.wikipedia.org/wiki/Double_compare-and-swap. On modern Intel, you can do it with TSX (transactional memory). – Peter Cordes Aug 09 '18 at 09:29

8 Answers8

58

This is the likely implementation for Interlocked.Exchange() in the CLR, copied from the SSCLI20 source:

Note that UP in the function name means UniProcessor. This is not atomic on SMP / multi-core systems. This implementation will only be used by CLR on single-core systems.

FASTCALL_FUNC ExchangeUP,8
        _ASSERT_ALIGNED_4_X86 ecx
        mov     eax, [ecx]      ; attempted comparand
retry:
        cmpxchg [ecx], edx
        jne     retry1          ; predicted NOT taken
        retn
retry1:
        jmp     retry
FASTCALL_ENDFUNC ExchangeUP

It is superior to using XCHG because this code works without taking a bus lock. xchg has an implicit lock prefix, so unlike xadd or cmpxchg it simply can't be omitted for single-core systems to still do the operation in one instruction to make it atomic with respect to interrupts (and thus other threads on uniprocessor).

The odd looking jumping code is an optimization in case branch prediction data is not available. Needless to say perhaps, trying to do a better job than what has been mulled over for many years by very good software engineers with generous helpings from the chip manufacturers is a tall task.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • 10
    Indeed. Indubitably. Quite. +1 – Jeff Yates Oct 04 '10 at 13:52
  • Why would the jump help if branch prediction data is not available? – TheBlastOne Sep 08 '11 at 19:13
  • 1
    @the - If it is not available then the processor has to guess. It always guesses at "jump not taken". Which is very likely correct with this code. Very likely wrong if the je instruction were used. Guessing wrong is very expensive because of the pipeline. – Hans Passant Sep 08 '11 at 19:28
  • @Hans, so what is the advantage? If prediction is unavailable, wouldn't it be cheapeer to just return? And if it is available, why would one insert a jump solely to force prediction to guess right? When *would* the jump be taken? jne jumps if ZF is not set, and cmpxchg sets ZF is the operands were equal, right? Sorry for not seeing the probably obvious point. – TheBlastOne Sep 09 '11 at 07:43
  • @the - you're not close. Why don't you ask a question about it? – Hans Passant Sep 09 '11 at 10:34
  • @Hans -- ask what? "When does the jne in Hans' Answer ever take the jump?" If one does not understand what's going on, as I probably do not being close, it's hard to find an intelligent question. I googled the jne and cmpxchg info, so that's not what I misunderstood (I think). But no worry, I was just wondering out of interest, not because I need an answer right away. Let me re-think cmpxchg, it probably does something to eax under certain conditions, and then I'll hopefully have more insight into this. – TheBlastOne Sep 09 '11 at 12:22
  • 1
    @HansPassant In the final account, I was able to write a blocking inter-thread messaging system that runs exactly 9.1 times faster than the built-in .Net 4.0 types required for this. The code also does more (routing), so it's technically an order of magnitude faster. So sometimes one can not only beat what has been mulled over for many years by very good software engineers with generous helpings from the chip manufacturers, but can smash it :) – IamIC Sep 15 '12 at 11:22
  • 1
    @IanC: can you describe which two things are you actually comparing when talking about "9.1 times" improvement? Especially since the answer you accepted actually uses `Interlocked`? – vgru Oct 26 '15 at 15:11
  • The 9.1 times is referring to code I wrote that passes messages vs. using the built-in lock { ... } method to sync passing messages. – IamIC Oct 27 '15 at 14:16
  • @Hans: Why did you roll back my edit? I hope because you're working on your own edit, becase your answer seems to imply that this code is somehow safe for modern systems. `cmpxchg` without `lock` is not atomic ([Is x86 CMPXCHG atomic?](https://stackoverflow.com/a/44273130)). It's only atomic with respect to interrupts on the same core, and thus unsafe for the same reason that `add dword ptr [mem], 1` is unsafe without `lock`: see [Can num++ be atomic for 'int num'?](https://stackoverflow.com/q/39393850). – Peter Cordes Aug 09 '18 at 09:57
  • I want you to stop messing with this post, this is copy/paste from source code that does not have this comment. You can file a bug at the CoreCLR project. – Hans Passant Aug 09 '18 at 09:58
  • TL:DR: on an SMP system, you might as well `mov eax, [ecx]` / `mov [ecx], edx` . – Peter Cordes Aug 09 '18 at 09:59
  • Ok, then the explanatory UP = uniprocessor needs to be in the text *outside* the code block. UP in the function name clearly means UniProcessor, where this code is safe. It would be a bug if it were used on an SMP machine, so the problem is you quoting it without that context. – Peter Cordes Aug 09 '18 at 10:00
  • I revised my first edit to preserve the integrity of the quoted source code, while adding sufficient warnings. I don't think this is a useful answer to the question, but a downvote isn't sufficient because non-atomic operations that appear to work can cause very hard-to-debug bugs if anyone were to use this code without realizing that. You linked to this from [a comment on a question about spinlocks](https://stackoverflow.com/questions/22943572/what-is-the-minimum-x86-assembly-needed-for-a-spinlock/22956288#comment35027561_22943572); recent activity there led me to this in the first place. – Peter Cordes Aug 09 '18 at 10:05
25

Here's kind of a weird idea. I don't know exactly how you have your data structure set up. But if it's possible you could store your two int values in a long, then I think you could swap them atomically.

For example, let's say you wrapped your two values in the following manner:

class SwappablePair
{
    long m_pair;

    public SwappablePair(int x, int y)
    {
        m_pair = ((long)x << 32) | (uint)y;
    }

    /// <summary>
    /// Reads the values of X and Y atomically.
    /// </summary>
    public void GetValues(out int x, out int y)
    {
        long current = Interlocked.Read(ref m_pair);

        x = (int)(current >> 32);
        y = (int)(current & 0xffffffff);
    }

    /// <summary>
    /// Sets the values of X and Y atomically.
    /// </summary>
    public void SetValues(int x, int y)
    {
        // If you wanted, you could also take the return value here
        // and set two out int parameters to indicate what the previous
        // values were.
        Interlocked.Exchange(ref m_pair, ((long)x << 32) | (uint)y);
    }
}

Then it seems you could add the following Swap method to result in a swapped pair "atomically" (actually, I don't know if it's fair to really say that the following is atomic; it's more like it produces the same result as an atomic swap).

/// <summary>
/// Swaps the values of X and Y atomically.
/// </summary>
public void Swap()
{
    long orig, swapped;
    do
    {
        orig = Interlocked.Read(ref m_pair);
        swapped = orig << 32 | (uint)(orig >> 32);
    } while (Interlocked.CompareExchange(ref m_pair, swapped, orig) != orig);
}

It is highly possible I've implemented this incorrectly, of course. And there could be a flaw in this idea. It's just an idea.

Dan Tao
  • 125,917
  • 54
  • 300
  • 447
  • Damn, why didn't I think of storing the 2 ints in a long! Good one! – IamIC Oct 07 '10 at 09:19
  • 2
    The technique is known from a solution to the ABA problem where you keep an incrementing 'version' counter in the upper 32-bits and your data in the lower 32, to detect ABA failures. See page 536 of Duffy's "Concurrent Programming on Windows." I wrote a managed lock-free dictionary based on 64-bit atomic operations, done entirely on paired 32-bit neighbors like this. – Glenn Slayden Jul 29 '12 at 21:46
  • @dantao This idea does work. In fact, I managed to place 3 numbers into a long, and control them using shifts. It's the perfect solution. – IamIC Sep 14 '12 at 12:47
  • 2
    @IanC: Awesome! Normally I look back on answers I posted to SO two years ago and cringe; but it's good to know this idea ultimately proved useful to you (even though I'm sure you had to make significant changes to apply it to your scenario). – Dan Tao Sep 14 '12 at 13:23
  • @DanTao Another useful feature of binding two values together is they can both be operated on in a single operation, such as incrementing one and decrementing the other. This is very practical in application. – IamIC Sep 19 '12 at 09:52
  • Unless SetValues needs sequential-consistency, it only needs to be an 8-byte store, not `xchg`. So you probably don't need `Interlocked.Exchange`. (x86 can do 8-byte atomic stores even in 32-bit mode, using x87 `fild`/`fistp`, or using SSE). I don't know how you ask for that in C#, but I assume that `Interlocked.Exchange` with an unused result is still a full memory barrier which you don't need. – Peter Cordes Aug 09 '18 at 09:22
  • And BTW, yes, implementing an atomic operation with a CAS retry loop is totally valid and normal for operations the HW doesn't provide. You can definitely call that an atomic swap. (Or an atomic rotate of a `long`. And BTW, no, x86-64 can't use `lock rol qword ptr [mem], 32` because [memory-destination rotates don't support a `lock` prefix](http://felixcloutier.com/x86/RCL:RCR:ROL:ROR.html): they #UD (illegal instruction) if you try. – Peter Cordes Aug 09 '18 at 09:26
24

Why isn't Interlocked.Exchange suitable for you?

If you require the exact memory locations to be swapped then you're using the wrong language and platform as .NET abstracts the memory management away so that you don't need to think about it.

If you must do something like this without Interlocked.Exchange, you could write some code marked as unsafe and do a traditional pointer-based swap as you might in C or C++, but you'd need to wrap it in a suitable synchronisation context so that it is an atomic operation.

Update
You don't need to resort to unsafe code to do a swap atomically. You can wrap the code in a synchronisation context to make it atomic.

lock (myLockObject)
{
  var x = Interlocked.Exchange(a, b);
  Interlocked.Exchange(b, x);
}

Update 2
If synchronisation is not an option (as indicated in the comments), then I believe you're out of luck. As you're chasing some unmeasured efficiency, you may want to concentrate elsewhere. If the swapping of two integer values is a huge performance hog, you're probably using the wrong platform.

Jeff Yates
  • 61,417
  • 20
  • 137
  • 189
  • It's not that I'm trying to get into memory management. I simply need to swap the values of two ints and do so atomically. Due to my design requirements, a lock is not an option. – IamIC Oct 04 '10 at 13:39
  • 1
    @IanC: Why is a lock not appropriate? Perhaps if you're more forthcoming with ALL the details, you can get an answer. Trickling additional constraints as we go is not helpful. – Jeff Yates Oct 04 '10 at 13:42
  • @Jeff I'm trying to keep it simple as the swap is all I am interested in :). The application requires 1 thread to write millions of messages a second and another thread reads these relatively occasionally (1000/sec). A lock would just be too much unneeded overhead as it would have to lock/unlock with no contention millions of times a second. My current code works fine, but if I could swap the 2 ints, it would be more efficient. – IamIC Oct 04 '10 at 13:45
  • 1
    @IanC: How do you know it would be more efficient? Seems to me that you're clutching at straws. Please post evidence. – Jeff Yates Oct 04 '10 at 13:47
  • 1
    @Jeff you are correct. .Net is not the best platform for this one requirement. But it's well suited for the rest of the project. For now, I will have to stick with my current code and see how that goes. – IamIC Oct 04 '10 at 14:21
  • @Jeff - not sure why you think I'm clutching at straws. It's quite straightforward that locks take up time. Somehow I think we're missing each other somewhere. – IamIC Oct 04 '10 at 14:22
  • @IanC: I said that because it appears you're optimizing without any evidence that your efforts will actually bear fruit. It was not in reference to locks, it was in reference to you requiring that the swap be atomic in the first place. – Jeff Yates Oct 04 '10 at 14:25
  • @Jeff: does `b = Interlocked.Exchange(ref a, b);` make the swap atomic, or does it hide the non-atomicness of the operation? – Matt Ellen Oct 04 '10 at 14:50
  • 1
    @Matt: The Interlocked.Exchange is atomic but the assignment to b as well is not. – Jeff Yates Oct 04 '10 at 14:59
  • @Jeff oh, I am 100% clear that the swap must be atomic. It's quite simple. Instead of swapping a with say 0, I want to swap it with b. – IamIC Oct 04 '10 at 15:19
  • @IanC: I understand that you are clear that you want it, but you give no evidence as to why making this operation atomic will give you any benefit. It seems your chasing a perceived optimization rather than a real one. – Jeff Yates Oct 04 '10 at 18:32
  • 2
    Hm, I have problem with the lock here - it doesn't make the statement inside the code block atomic per-se - only atomicity for it would come from having that same lock elsewhere - access to the variables without it will break that atom just like that... – Daniel Mošmondor Oct 04 '10 at 22:38
  • @Daniel: You're right, you would have to guard the variables with that lock to be sure or use a critical section that does not rely on a specific lock object. – Jeff Yates Oct 05 '10 at 14:09
  • @Daniel: Of course, the latter isn't necessarily possible in .NET (as far as I am aware). – Jeff Yates Oct 05 '10 at 14:15
8

Interlocked.Exchange is really the only thing you can do:

  var x = Interlocked.Exchange(a, b);
  Interlocked.Exchange(b, x);

You are correct that this will not be atomic, but with a local variable used you are guaranteed that the values are consistent as long as both lines execute. Your other options are unsafe code (for using pointers), using p/invoke to a native library, or redesigning so that it's no longer required.

Philip Rieck
  • 32,368
  • 11
  • 87
  • 99
  • It has to be atomic. Could you point me to a reference on how to do this with unsafe code? – IamIC Oct 04 '10 at 13:38
  • Interlocked.Exchange is an atomic operation, however, 2 in a row is not without synchronisation. As indicated in the MSDN documentation - http://msdn.microsoft.com/en-us/library/system.threading.interlocked.exchange.aspx – Jeff Yates Oct 04 '10 at 13:38
  • @IanC unsafe c# will not make it any *more* atomic than that. If you really, really need XCHG, you'll have to write it yourself in c. Not sure transitioning will give you a huge performance edge over a slim lock, though. See something like this: http://www.codeproject.com/KB/cs/unmanage.aspx – Philip Rieck Oct 04 '10 at 14:01
  • 1
    @Jeff Yates Indeed 2 in a row is not atomic - that was my point here. He wants the actual *Swapping* of two values, not just a change in one value, and I'm saying that this is as close as you can get to it in pure c# – Philip Rieck Oct 04 '10 at 14:02
5

According to MSDN, Interlocked.Exchange is atomic.

If it is not suitable for you, you could implement XCHG in an unsafe section using C/C++.

Rob Vermeulen
  • 1,910
  • 1
  • 15
  • 22
1

Interlocked.Exchange is best way to swap two int values in a thread safe manner in c#.

Use this class , even you have multiprocessor computer and you never know in which processor your thread is going to run.

TalentTuner
  • 17,262
  • 5
  • 38
  • 63
0

Outside of Interlocked.Exchange, I'm assuming the XCHG command is probably an implementation of the XOR Swap, so you could write your own.

C syntax: (from wikipedia link)

 void xorSwap (int *x, int *y) {
     if (x != y) {
         *x ^= *y;
         *y ^= *x;
         *x ^= *y;
     }
 }

Edit this is not atomic, you would have to synchronize it yourself

Kenny Cason
  • 12,109
  • 11
  • 47
  • 72
  • 6
    That wouldn't be atomic as per the OP's requirments. – Jeff Yates Oct 04 '10 at 13:43
  • 1
    XCHG is an x86 CPU instruction and it's atomic in the sense that it cannot be interrupted by another instruction. The XOR swap isn't atomic without adding extra locking on the parameters first. – Rup Oct 04 '10 at 13:45
  • @Rup ah, I wasn't aware of the requirement of being atomic. – Kenny Cason Oct 04 '10 at 13:48
0

I think I just found the best solution. It is:

Interlocked.Exchange() Method (ref T, T)

All the "new" variables can be set in a class (Of T) and swapped with the current variables. This enables an atomic snapshot to take place, while effectively swapping any number of variables.

IamIC
  • 17,747
  • 20
  • 91
  • 154
  • Yes, the only issue I see with this approach is that suddenly you have to put your data in an immutable reference type, which may be overdoing it, depending on how often you're performing swaps. – Dan Tao Oct 04 '10 at 19:14
  • @Dan, I don't see any issue with that. This is swapping the pointer only, obviously. It won't affect writing to the object speed. – IamIC Oct 04 '10 at 20:21
  • @IanC: I just mean that, if I understand you correctly, you would need to create a new object for every swap (like Interlocked.Exchange(pair, new Pair(pair.X, pair.Y))). Maybe that's fine; it just seemed a bit... heavy? Or perhaps I've misunderstood. – Dan Tao Oct 04 '10 at 20:39
  • @Dan I would only need to create 2 objects and simply keep swapping the 2 pointers to them. Once the swap is done and the data read, the object could be zero'd and would then be reads to be swapped. You see what I mean? – IamIC Oct 07 '10 at 09:17
  • @IanC: I guess I assumed you were asking about an atomic swap because you're dealing with multiple threads. The problem with the idea you just described is that your "in-reserve" object could be modified by multiple threads concurrently, and then swapped having invalid values. You could possibly do this without continual object allocation using some sort of pool, but that might end up being overly complex for what should be a simple problem. – Dan Tao Oct 07 '10 at 13:32
  • @Dan the "in reserve" object would only be modified by one master controller thread. At this point, I'd have to get into a lengthy explanation that goes way beyond the original question's scope. – IamIC Oct 08 '10 at 21:17
  • I have subsequently tested the idea I proposed and it is working thus far. More tests to follow. – IamIC Nov 12 '10 at 17:37
  • @IanC: Actually, the more I think about this idea, the more I'm not buying it. How can you ensure the atomicity of a *swap* without having an atomic *read*? In other words, say the data you want swapped is all in object X. You need to populate object Y with the "swapped" version of that data before swapping X and Y. I see no hint here as to how all of that happens atomically—only how the references to X and Y themselves are swapped, which is merely a small piece of the puzzle. But you're right of course that I may just not be familiar enough with your exact circumstances to evaluate this idea. – Dan Tao Nov 13 '10 at 17:44
  • @Dan I have two references to two instances of the objects I wish to swap. The references are marked volatile. Let me post the code as an update. – IamIC Nov 13 '10 at 18:34