How to write a short block of inline gnu extended assembly to swap the values of two integer variables?

Question

For entertainment, I am learning gnu extended assembly using AT&T syntax for x86 with a 32bit Linux target. I have just spent the last three hours coding two possible solutions to my challenge of swapping the values of two integer variables a and b, and neither of my solutions completely solved my problem. First, let's look at my TODO obstacle in some more detail:

int main()
{
    int a = 2, b = 1;
    printf("a is %d, b is %d\n", a, b);
    // TODO: swap a and b using extended assembly, and do not modify the program in any other way
    printf("a is %d, b is %d\n", a, b);
}

After reading this HOWTO, I wrote the following inline extended assembler code. Here is my first attempt at swapping the integers:

asm volatile("movl %0, %%eax;"
    "movl %1, %%ecx;"
    "movl %%ecx, %0;"
  : "=r" (a)
  : "r" (b)
  : "%eax", "%ecx");

asm volatile("movl %%eax, %0;"
  : "=r" (b)
  : "r" (a)
  : "%eax", "%ecx");

My reasoning was that to set a = b, I needed an extended assembly call that was separated from the assembly to set b = a. So I wrote the two extended assembly calls, compiled my code, i.e., gcc -m32 asmPractice.c, and ran a.out. The results were as follows:

a is 2, b is 1

a is 1, b is 1

Seeing how that did not work properly, I then decided to combine the two extended assembler calls, and wrote this:

asm volatile("movl %0, %%eax;"
    "movl %1, %%ecx;"
    "movl %%ecx, %0;"
    "movl %%eax, %1;"
  : "=r" (a)
  : "r" (b));

After recompiling and linking, my code still does not correctly swap both values. See for yourself. Here are my results:

a is 2, b is 1

a is 1, b is 1

Since you are passing registers you could just do `xchg %0, %1` . Using moves you only need 1 temporary register. Copy %0 to that register. Then copy %1 to %0 and then copy the temp register to %1. The temporary will need to be listed in the clobber list — Michael Petch, Aug 28 '17 at 00:45
Also see the XOR swap algorithm https://en.wikipedia.org/wiki/XOR_swap_algorithm — Richard Chambers, Aug 28 '17 at 00:47
Your existing inline assembly also has the problem that `a` and `b` are both inputs and output. So **both** should be using a read write constraint of `"+r"` — Michael Petch, Aug 28 '17 at 00:52
You can also get the compiler to choose the temporary register by passing a dummy variable into the template using an output constraint. — Michael Petch, Aug 28 '17 at 00:57
If you want to use `mov` (rather than the simpler `xchg` instruction) then it would look like `int a = 1; int b = 2; int dummy; asm ("movl %0, %2\n\t" "movl %1, %0\n\t" "movl %2, %1;" : "+r" (a), "+r" (b), "=r"(dummy));` . Using `xchg` would look like `asm ("xchg %0, %1" : "+r" (a), "+r" (b));` — Michael Petch, Aug 28 '17 at 01:06
How about something that uses no instructions at all: https://stackoverflow.com/a/24841962/2189500 — David Wohlferd, Aug 28 '17 at 01:14
lol @DavidWohlferd : that was my next comment. There is a natural progression to the empty template. You beat me to it — Michael Petch, Aug 28 '17 at 01:16
The logical progression after that is to not use inline assemble at all lol — Michael Petch, Aug 28 '17 at 01:18
"that was my next comment" You, me, Peter. Seems like everyone who plays with extended asm realizes this at some point. "not use inline assemble" - Oh yeah, that reminds me: There are [reasons](https://gcc.gnu.org/wiki/DontUseInlineAsm) not to use inline asm. If the goal is entertainment (as OP says), then have at it. It's interesting, challenging, powerful, gives insight into how the compiler sees the world, etc. But don't be seduced into using it in real code. — David Wohlferd, Aug 28 '17 at 01:33
@Chris: See the x86 tag wiki (https://stackoverflow.com/tags/x86/info), and also https://stackoverflow.com/tags/inline-assembly/info. I'd recommend learning x86 assembly separately from learning GNU C inline assembly, because to use it correctly you have to already understand assembly and compilers to write constraints correctly. Make separate functions that you call from C. — Peter Cordes, Aug 28 '17 at 02:30
You don't need `volatile` because the only function of the asm is to produce its outputs. You *want* the compiler to optimize it away if the outputs are unused. — Peter Cordes, Aug 28 '17 at 02:31

score 4 · Accepted Answer · edited Feb 23 '19 at 06:07

4

Here are some solutions from the comments:

Solution #0 (best option): https://gcc.gnu.org/wiki/DontUseInlineAsm
Even the zero-instruction solution defeats constant-propagation, and any other optimization that involves gcc knowing anything about the value. It also forces the compiler to have both variables in registers at the same time at that point. Always keep these downsides in mind when considering using inline-asm instead of builtins / intrinsics.

Solution #1: x86 xchg, no scratch regs, and works in both AT&T and Intel-syntax modes. Costs about the same as 3 mov instructions on most Intel CPUs, or only 2 uops on some AMD.

asm("xchg %0, %1;" : "+r" (a), "+r" (b));

Solution #2: purely using GNU C inline asm constraints. (Bonus: portable to all architectures)

asm("" : "=r" (a), "=r" (b) : "1" (a), "0" (b));

See all three solutions in action on the Godbolt compiler explorer, including examples of them defeating optimization:

int swap_constraints(int a, int b) {
    asm("" : "=r" (a), "=r" (b) : "1" (a), "0" (b));
    return a;
}

// Demonstrate the optimization-defeating behaviour:
int swap_constraints_constants(void) {
  int a = 10, b = 20;
  return swap_constraints(a, b) + 15;
}

swap_constraints_constants:
    movl    $10, %edx
    movl    $20, %eax
    addl    $15, %eax
    ret

vs. with a pure C swap:

swap_noasm_constants:
    movl    $35, %eax    # the add is done at compile-time, and `a` is optimized away as unused.
    ret

edited Feb 23 '19 at 06:07

Peter Cordes

328,167
45
605
847

answered Aug 28 '17 at 01:35

Bryon Gloden

316
1
12

You don't need an early-clobber, because `xchg` is a single instruction. – Peter Cordes Aug 28 '17 at 01:39
1

@PeterCordes I made a typo in my comment. Was meant to be % not & since I was talking about commutative properties. He propagated my typo to the answer. – Michael Petch Aug 28 '17 at 01:40
@MichaelPetch I thought [commutative](https://gcc.gnu.org/onlinedocs/gcc/Modifiers.html) only applied to inputs (aka "read-only operands"). – David Wohlferd Aug 28 '17 at 01:42
Yes, that's a good point @DavidWohlferd . That is what the docs say but I'm not entirely sure they are correct. GCC won't complain and I think I had observed a change in behavior in generated code in the past. Might have to ask the GCC mailing list about it lol – Michael Petch Aug 28 '17 at 01:45
@MichaelPetch: I don't think it gains you anything to tell the compiler they're commutative. You're already using the same `"+r"` constraint for both, so it can pick any register it wants. It would be relevant for `"+%a"` and `"+d"`, though. Update: https://godbolt.org/g/HxSVHQ `"+%S"(a)` doesn't work as commutative. The compiler still puts `a` into RSI instead of using `b` which is already there. – Peter Cordes Aug 28 '17 at 01:51
@Chris: I expanded your answer a bunch. I could have posted that as my own answer, but your answer already had the right framework to add some stuff. (I intended to just change `&` to `%`...) Anyway, roll back if you don't like it, and I can post my own. – Peter Cordes Aug 28 '17 at 02:27
Hmm, my example worked fine for testing, but it makes a poor example, or at least badly named functions. `swap_constraints` uses an asm swap, but only returns the resulting `a`. – Peter Cordes Aug 28 '17 at 02:35
1

@MichaelPetch: I noticed super-weird behaviour. In a file by itself, swap with a `"+%S"(a)` constraint produced extra instructions. But at the bottom of the godbolt link I edited into this answer, it doesn't! Somehow gcc compiles the exact same function differently depending on what it did earlier. (I think it's taking advantage of the fact that `b` is unused after the asm, since it's not like that if I `return a+b`) – Peter Cordes Aug 28 '17 at 02:35
1

@MichaelPetch: Actually I think gcc7.2 maybe is looking inside the inline-asm and recognizing the `xchg`. But only when the string is exactly `"xchg %0, %1;"` including the `;`, and with no extra space (leading space, or extra space between the `xchg` and the `%0` give the expected 2-instruction setup. https://godbolt.org/g/gb6EEf. I can reproduce this with gcc7.1 on my desktop. I thought the fact that the compiler doesn't look at / understand the asm string was an absolute for GNU C inline asm. So maybe this is a compiler bug. – Peter Cordes Aug 28 '17 at 02:50
3

@MichaelPetch: Ah, it was triggered by being in the same compilation unit as another function that used the same string with different constraints first. Definitely a bug. Reporting it. – Peter Cordes Aug 28 '17 at 02:57
2

"not entirely sure [the docs] are correct" Ahh, so for those who don't find writing inline asm challenging enough, you can always toss in some undocumented behavior... Hopefully all this has driven home the point to OP that using inline asm is *tricky*. Even the long-time experts struggle with it, even with absurdly simple examples. – David Wohlferd Aug 28 '17 at 03:31
@DavidWohlferd Linus was a user of some of those undocumented (at the time) features lol. `P` and `p` constraints come to mind. lol – Michael Petch Aug 28 '17 at 04:01
2

@MichaelPetch: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82001 in case you're interested. It only happens when the whole function is identical other than asm register constraints, and with `-O2` or higher. Regression introduced in gcc5.0. Probably some kind of identical-function folding doesn't check constraints carefully enough. – Peter Cordes Aug 28 '17 at 05:49

How to write a short block of inline gnu extended assembly to swap the values of two integer variables?

1 Answers1

Linked