8

I'm working on some code where it would be highly desirable to take condition-flags output from an inline asm block and use that as a condition to branch on in the calling C code. I don't want to store the flags (that would be useless and inefficient; there are already more efficient ways to achieve the result) but use the flags directly. Is there any way to achieve this with GNU C inline asm constraints? I'm interested in approaches that would work for multiple instruction set architectures, with the intent of using it with the condition flags produced by the architecture's LL/SC style atomics. Of course another obvious usage case (separate from what I'm doing) would be to allow the outer C code to branch on the result of the carry flag from an operation in inline asm.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • You might find [this](https://gcc.gnu.org/ml/gcc/2015-05/msg00006.html) discussion interesting, if not currently helpful. And let me make another plug for intrinsics. While it may seem like inline asm provides good solutions, the code *around* your inline stuff can pay a performance penalty that costs more than whatever benefit you get from the asm. Just saying. – David Wohlferd May 19 '15 at 08:19
  • There is certainly something to be said for intrinsics, but the devil is in the details. On some targets they produce very bad code. On ARM they produce a `dmb sy` barrier (synchronize with all external hardware on the bus) rather than the desired `dmb ish` (synchronize only with cpu cores). And on others they produce library calls instead of inline code, or produce code that's incompatible with some newer targets when using an older `-march` rather than allowing the runtime branches with multiple variants we need for such targets. – R.. GitHub STOP HELPING ICE May 19 '15 at 21:10
  • Thankfully modern compilers are **very good** about integrating inline asm and C as long as the constraints are written well. Right now I have the whole lls/sc sequence inside an asm block, which requires separate asm per atomic operation per target. If I could refactor to separate asm blocks for the "ll" and "sc" parts, with C code for the operation in between, I'd have identical efficiency without any of the per-target duplication; all ll/sc-type targets could use the same C code with only the ll and sc arm varying. – R.. GitHub STOP HELPING ICE May 19 '15 at 21:13

3 Answers3

7

Starting with GCC6 on x86 you can actually use "=@ccCOND" as output (where COND is any valid x86 condition code).

Example originally from here, cleaned up by David's suggestions:

int variable_test_bit(long n, volatile const unsigned long *addr)
{
    int oldbit;
    asm volatile("bt %[value],%[bit]"
                 : "=@ccc" (oldbit)
                 : [value] "m" (*addr), [bit] "Jr" (n));
    return oldbit;
}

Before using this, you should test if __GCC_ASM_FLAG_OUTPUTS__ is defined.

Documentation at https://gcc.gnu.org.

chtz
  • 17,329
  • 4
  • 26
  • 56
  • I'm not sure `I` is the right constraint here. Longs can be 64bits on some platforms and I is only for values 0..31. I'm also not clear why we need to cast away the `const` and `volatile`. It's an input param, so const shouldn't be a problem. It's a `m` so volatile is implicit. I might also be tempted to add symbolic names (`[bit]` and `[value]`). Doesn't affect functionality, but might be clearer for those of us used to intel format. – David Wohlferd Apr 25 '18 at 00:48
  • @DavidWohlferd I admittedly did not manage to have gcc produce an `adc` instruction, when adding `oldbit` with two other integers ... – chtz Apr 25 '18 at 01:14
  • 1
    I wouldn't expect it to do adc. Anyway, that wasn't the OP's requirement. He wanted jumps. Your code does that (https://godbolt.org/g/gA2qgL). – David Wohlferd Apr 25 '18 at 01:31
  • If you only want to allow 0..63, you should ask for `*addr` in a register. The memory-operand form of `bt` is much slower, especially with a register source ([10 uops on Haswell vs. 1 for reg,reg](http://agner.org/optimize/)). Also, a register source would allow accessing memory other than the qword your operand tells gcc about, so you'd really want [something like this](https://stackoverflow.com/questions/1956379/att-asm-inline-c-problem/47358313?noredirect=1#comment81680441_47358313): `[value] "m" (*(const unsigned long (*)[]) addr)` to tell gcc it might access anywhere relative to `addr`. – Peter Cordes Apr 25 '18 at 06:29
  • TL:DR: `bt*` instructions with a memory operand have insane-CISC bitstring semantics: http://felixcloutier.com/x86/BT.html, and this is handled with microcode for register sources. Best to make the compiler load into a register. – Peter Cordes Apr 25 '18 at 06:31
1

I have a partial solution, but I don't really like it because it requires putting the branch instruction inside the asm, and because it requires a very ugly GCC feature that other "GNU C compatible" compilers might not support: asm goto. It does however allow the branch outside the asm to be eliminated. The idea is:

static inline int foo(...)
{
    __asm__ goto ( " .... ; cond_jmp %l[ret0]" : : "r"(...) ... 
                   : "clobbers" : ret0 );
    return 1;
ret0:
    return 0;
}

When inlined into the caller that does if (foo(...)) ... else ..., the conditional jump in the asm block ends up pointing directly to the else branch, even though at the abstract-machine level there are return values involved.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • The Linux kernel uses this kind of construct in several places, and apparently it does optimize as well as you'd hope when you do `if( foo(...) ) {} else {}`. The compiler never actually puts a `1` or `0` in a register, just uses the `asm goto` as the `if` branch. You are forcing the choice of which side is take vs. fall-through, though. Probably gcc6 condition-code outputs will optimize at least as well, unless your code has an early-out condition (so the conditional branch isn't the last instruction in your asm statement). In that case, this is still useful, +1. – Peter Cordes Apr 25 '18 at 06:20
-1

Unfortunately GCC doesn't support accessing condition flags outside of asm statements. If you don't want to set a value then you'll have to move the conditional branch into the asm statement. That means either using asm goto labels that you've already discovered, or also bringing branch target into your asm statement.

You might also want to check to see if either GCC's old style __sync atomic builtins or the newer memory model based atomics provide the functionality you want out of the atomic instructions you're using.

Ross Ridge
  • 38,414
  • 7
  • 81
  • 112