3

On my computer, the compiled executable omits executing "mov %2, %%ax" at the top of the loop

when "add %1, %%ax" uncommented.

Anyone to doublecheck or comment ?

#include <stdio.h>

int main() {

short unsigned result, low ,high;

    low  = 0;
    high = 1;

    __asm__ (   
        "movl $10, %%ecx \n\t"

        "loop: mov  %2, %%ax \n\t"

//      "add    %1, %%ax \n\t"      // uncomment and result = 10
        "mov    %%ax, %0     \n\t"

        "subl   $1, %%ecx \n\t"                 
        "jnz loop"                              
        : "=r" (result)
        : "r" (low) , "r" (high)
        : "%ecx" ,"%eax" );        

    printf("%d\n", result);  
    return 0;
}

Follows the assembly generated

movl $1, %esi
xorl %edx, %edx
/APP
movl $10 ,%ecx 

loop: mov %si, %ax 
mov  %dx, %bx 
add %bx, %ax 
mov %ax, %dx     
subl $1, %ecx 
jnz loop  
/NO_APP

Thanks to Jester the solution :

    : "=&r" (result)        // early clober modifier
OneArb
  • 453
  • 2
  • 14

1 Answers1

6

GCC inline assembly is advanced programming, with a lot of pitfalls. Make sure you actually need it, and can't replace it with standalone assembly module, or C code using intrinsics. or vector support.

If you insist on inline assembly, you should be prepared to at least look at the generated assembly code and try to figure out any mistakes from there. Obviously the compiler does not omit anything that you write into the asm block, it just substitutes the arguments. If you look at the generated code, you might see something like this:

    add    %dx, %ax
    mov    %ax, %dx

Apparently the compiler picked dx for both argument 0 and 1. It is allowed to do that, because by default it assumes that the input arguments are consumed before any outputs are written. To signal that this is not the case, you must use an early clobber modifier for your output operand, so it would look like "=&r".

PS: Even when inline assembly seems to work, it may have hidden problems that will bite you another day, when the compiler happens to make other choices. You should really avoid it.

Jester
  • 56,577
  • 4
  • 81
  • 125
  • Voted for your answer. I'm still trying to figure out what the `%1` and `%0` syntax are doing. A pointer to a source to learn something about these would be welcome. – User.1 Oct 26 '14 at 00:08
  • Thanks for the solution ! I have yet to find how to launch the assembler on minGW. I use Inline assembly in early testing to get a feel of how fast the inner loop can run. I'll look into the standalone ASM module and other pointers. – OneArb Oct 26 '14 at 00:44
  • 3
    @User.1 if you are looking for docs about gcc's inline asm, I recommend going right to the source: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html – David Wohlferd Oct 29 '14 at 04:12
  • Your statement that inline assembly should be avoided puzzles me. When you only want to optimize a few functions in assembly and you want those functions to be inline what other option is there? For example imagine you wanted a 256-bit big integer add using add and adc. You clearly want this function to be inlined. – Z boson Mar 13 '15 at 08:49
  • @Zboson: I assumed he meant "avoided when you can do it another way", which I'd agree with. It's really easy to make a mistake with inline asm that doesn't show up under the test conditions. In this case, though, **there isn't another way** to get the compiler to emit `add/adc/adc/adc`. gcc and clang will emit `adc` if you write stuff like `high += hiAdd + (low < lowAdd)`, but only after actually doing a `cmp` to set CF, because it doesn't realize add will set CF the same way. http://goo.gl/Ea17p8 – Peter Cordes Dec 23 '15 at 06:46
  • @PeterCordes, yeah, anybody following my questions/answers recently can probably tell that I am still learning inline assembly. Part of the reason I have asked theses questions: http://stackoverflow.com/questions/34415238/embedded-broadcasts-with-intrinsics-and-assembly and http://stackoverflow.com/questions/34244185/looping-over-arrays-with-inline-assembly is to get somebody to show me a better way to do it with inline assembly. Hint: please answer the questions. BTW, for multi-word add MSVC and ICC can do it efficiently with intrinsics so inline assembly is not necessary. – Z boson Dec 23 '15 at 07:45