how to let Inline assembly pass -O1 optimization

Question

I have following dispatch code for my user level thread library.

The code can pass GCC and runs correctly without optimization, but if I choose -O1 optimization (also higher levels), when run the code, program generates segmentation fault.

Basically the function does save context and jump to next context.

void __attribute__ ((noinline)) __lwt_dispatch(lwt_context *curr, lwt_context *next)
{
__asm__ __volatile
    (

    "mov 0xc(%ebp),%eax\n\t"
    "mov 0x4(%eax),%ecx\n\t"
    "mov (%eax),%edx\n\t"
    "mov 0x8(%ebp),%eax\n\t"
    "add $0x4,%eax\n\t"
    "mov 0x8(%ebp),%ebx\n\t"
    "push %ebp\n\t"
    "push %ebx\n\t"
    "mov %esp,(%eax)\n\t"
    "movl $return,(%ebx)\n\t"
    "mov %ecx,%esp\n\t"
    "jmp *%edx\n\t"
    "return: pop %ebx\n\t"
    "pop %ebp\n\t"
    );
}

What does that mean? I am running this together with other functions in 32 bit linux GCC compiler. — Larry, Feb 03 '17 at 18:53
He's asking whether you tried just writing the entire `__lwt_dispatch` in assembler and linked that to your code. The reality here is that your code assumes there is a stack frame among other things (_EBP_ may not be used in that case). Your code also destroys the contents of _EBX_ before it is pushed. This would violate the [_CDECL_](https://en.wikipedia.org/wiki/X86_calling_conventions#cdecl) calling convention since _EBX_ is a callee saved (non-volatile) register. — Michael Petch, Feb 03 '17 at 19:53
GCC inline assembler is hard to get right, and it would be much easier (if you are new to inline assembler in GCC) if you wrote the function purely in assembler and linked that into your code. As it stands you should be looking at an [extended inline assembler template](https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html) with constraints that allow you to pass the values of the parameters into the template, specify what is used as input and output, and which registers get clobbered. — Michael Petch, Feb 03 '17 at 19:56
You sort of skirt some of these issues by trying to make the entire function non-inline, but that can lead to code that may appear to work but not always. Writing this function purely in assembler and linking against it would avoid needing to understand GCC's hard to get right inline assembler, and you would also have direct knowledge of where variables are on the stack and whether a stackframe is present in your function. — Michael Petch, Feb 03 '17 at 19:59
Also, I'm not sure that the `noinline` attribute survives all types of optimizations. I believe that some of them (-fwhole-program? LTO?) can inline anyway. — David Wohlferd, Feb 03 '17 at 23:18
Thanks for you help!I did compiled dispatch function as a separate .o file and use O3 optimization to link it with others, it worked and performs good. I will check if I can inline this function with register clobber and protection. — Larry, Feb 04 '17 at 22:12

score 0 · Answer 1 · edited May 23 '17 at 12:16

0

Thanks for help, I figured out some ways to solve it.

Normally compile this function as a separate .o file then use O3 to optimize it with other files.
using inline assembly is much easier and simpler than this function. Like below:

int foo = 10, bar = 15; asm volatile("addl %%ebx,%%eax" :"=a"(foo) :"a"(foo), "b"(bar) ); printf("foo+bar=%d\n", foo);
Another post has helped me figuring out labeling problem, see here: Labels in GCC inline assembly

edited May 23 '17 at 12:16

Community

1
1

answered Feb 05 '17 at 01:14

Larry

525
1
4
5

I understand how seductive inline asm is. The fact that I enjoy playing with it is why I am here answering questions about it. But there are some [really good](https://gcc.gnu.org/wiki/DontUseInlineAsm) reasons not to use it. (Speaking from experience) the more you learn about it, the more you realize you should leave it alone except under *very unusual* circumstances. Among those reasons are the difficulties involved in getting it right. Looking at your (very) simple example, you have a number of flaws, which won't give incorrect answers, but can produce unnecessarily inefficient code. – David Wohlferd Feb 06 '17 at 22:52
1) The `volatile` qualifier should be omitted since it *requires* gcc to calculate the value, even if the optimizer determines that the output is unused. 2) Letting gcc pick which registers to use typically results in better code than hard coding them since you don't know how those registers are being used in surrounding code. 3) Permitting gcc to use memory operands can avoid the expense of saving/restoring registers if gcc is register constrained. 4) You should allow for using immediate mode (ie `addl $5, eax`) since bar might be a constant. – David Wohlferd Feb 06 '17 at 22:53
5) If both foo and bar are constants, gcc can compute the result while compiling, *unless* you are using inline asm, which must always do the calculation at runtime. 6) You don't allow for the operands to be commutative, which might be a performance win. All this from a SINGLE LINE of code. Which doesn't even address portability (to other hardware, compilers, etc), maintainability, etc. If the goal is learning/playing, then go for it. But it's a bad habit to acquire, and should be avoided in production code. – David Wohlferd Feb 06 '17 at 22:54

how to let Inline assembly pass -O1 optimization

1 Answers1

Linked