Bus error: Inline x86 assembly with GCC asm{} block on Mac OS X, converting from MSVC naked function

Question

I am getting a "Bus error" when I try to run this code compiled with gcc 4.2.1 on Snow Leopard

#include <stdio.h>

/*__declspec(naked)*/ void
doStuff(unsigned long int val, unsigned long int flags, unsigned char *result)
{
    __asm{
        push eax
        push ebx
       push ecx
        push edx
    
        mov eax, dword ptr[esp + 24]//val
        mov ebx, dword ptr[esp + 28]//flags
        //mov ecx, dword ptr[esp + 32]//result

        and eax, ebx
        mov result, eax

        pop edx
        pop ecx
        pop ebx
        pop eax

        ret
    }
}

int main(int argc, char *argv[])
{
    unsigned long val = 0xAA00A1F2; 
    unsigned long flags = 0x00100001;   
    unsigned char result = 0x0;
    
    doStuff(val, flags, &result);   
    printf("Result is: %2Xh\n", result);
    
    return 0;
}

I am using the following command to compile gcc -fasm-blocks -m32 -o so so.c without any errors or warnings. I am trying to run some assembly instructions in the doStuff() function and assign the answer to result. What am I doing wrong?

(The gcc command on this Mac is probably actually clang, or an Apple modification, since mainline GCC doesn't support -fasm-blocks.)

Note: This worked fine in Visual Studio on Windows, but I had to comment out declspec(naked) to get gcc to compile it on Mac.

@styfle - maybe your mac os is 64bit, and each element on the stack is 8bytes long? can you check it? — MByD, Apr 26 '11 at 01:04
@MByD I have a 64-bit CPU (Core 2 Duo) but the 64-bit Kernel is not enabled and that is why I needed the -m32 argument when compiling or else I get `suffix or operands invalid for push` errors. — styfle, Apr 26 '11 at 01:09
@MByD Or maybe it is enabled and that's why I needed the -m32 argument. Now I'm confused. Also, if there was an issue with the word length, wouldn't there be an error during compilation/assembly? — styfle, Apr 26 '11 at 01:20
@MByD Well thank you for the input. Are you able to compile and run it on your machine? — styfle, Apr 26 '11 at 01:22
I get the same results as you and trying some stuff. will update you :) — MByD, Apr 26 '11 at 01:30
after changing the registers to `rax`, `rbx` etc in the push and pop instruction I was able to compile without -32, but got segmentation fault. — MByD, Apr 26 '11 at 01:45
I just tried changing the registers to rax, etc and also got Segmentation fault. I would prefer to use the 32bit version because I am still new to assembly and haven't seen the 64 bit stuff. — styfle, Apr 26 '11 at 02:12

score 5 · Accepted Answer · edited May 09 '22 at 11:34

5

The reason you are getting the bus error is because you are calling ret within your assembly code. ret causes the program control to transfer to the return address at the top of the stack, which you manipulate through use of push and pop. I'd highly suggest looking up what ret does in the Intel Instruction Set Reference.

Below is code that I compiled, and runs successfully, on an iMac running Mac OS X 10.6.7.

#include <stdio.h>

/*__declspec(naked)*/ void
doStuff(unsigned long int val, unsigned long int flags, unsigned char *result)
{
  __asm
    {
        push eax
        push ebx
        push ecx

        mov eax, dword ptr[ebp + 8]  //val
        mov ebx, dword ptr[ebp + 12] //flags
        mov ecx, dword ptr[ebp + 16] //result

        and eax, ebx
        mov [ecx], eax

        pop ecx
        pop ebx
        pop eax
      }
}

int main(int argc, char *argv[])
{
  unsigned long val =   0xAA00A1F2;
  unsigned long flags = 0x00100002;
  unsigned char result = 0x0;
    
  doStuff(val, flags, &result);
  printf("Result is: %2Xh\n", result);
    
  return 0;
}

Notable changes are:

Removal of ret in the inline assembly
Use of register ebp instead of esp to reference the parameters to doStuff
Changing flags to be 0x00100002

Change (1) fixes the bus error, (2) made referencing the parameters a little more consistent, and (3) is just a quick way to make sure the function is working as expected.

Lastly, I highly recommend you become familiar with the GNU Debugger, GDB, if you haven't already. You can find more info about it at the project page http://www.gnu.org/software/gdb/ as well as information about the Mac implementation and tutorial at https://developer.apple.com/library/archive/documentation/DeveloperTools/gdb/gdb/gdb_toc.html.

edited May 09 '22 at 11:34

Cœur

37,241
25
195
267

answered Apr 26 '11 at 08:59

Dean Pucsek

257
2
7

The use of `ret` works in Visual Studio and I thought it was required since a function call is supposed to push the return address on the stack and ret pops it off so it knows where to go after the function call. Also I have never seen ebp. All I found is ebp is different than esp because "the base pointer is manipulated only explicitly." Lastly, I have no idea what gdb is. Can you provide a link? Thanks. – styfle Apr 26 '11 at 16:53
I just updated the answer with some links to GDB. I can't confirm this (I don't have Visual Studio on this machine), but my suspicion is that the reason using `ret` in VS worked was a result of how the compiler inserted your assembly into `doStuff()`. Try compiling your code with `gcc -S ...` to see what assembly the compiler generates. As you found out, `ebp` is the base pointer and it points to the bottom of the stack whereas `esp` points to the top. It doesn't really matter which one you use as long as you get the offsets right. – Dean Pucsek Apr 26 '11 at 18:05
You shouldn't manually `push`/`pop` registers inside an `asm{}` block, with either MSVC or `clang -fasm-blocks`. They both parse your asm and figure out which registers are actually written, and treat the block as clobbering those registers. (And if necessary, have the containing function save/restore any of those registers that are call-preserved in the calling convention, in whatever function this eventually inlines into). – Peter Cordes May 09 '22 at 12:06
https://godbolt.org/z/jascv6TaP. It does seem that this also convinces clang to set up EBP as a traditional frame pointer, but **doesn't prevent inlining! So you can end up trying to access the wrong function's args.** (as demoed in the Godbolt link). Actually, taking out the push/pop instructions so the code in the asm block doesn't use stack operations (https://godbolt.org/z/4z75d4WfY), clang doesn't use an EBP frame ptr. I guess it was setting up EBP because it saw stack operations inside the asm block. – Peter Cordes May 09 '22 at 12:06
The right fix, with modern clang, is to **use `__attribute__((naked))`**. That might not have been supported in clang in 2011 for x86, IDK, but this answer was never safe. Or use the named C vars: https://godbolt.org/z/b6zTGEKhf – Peter Cordes May 09 '22 at 12:09

score 2 · Answer 2 · answered Apr 26 '11 at 19:58

Compilers add prologues and epilogues to function calls, these prologues and epilogues take care of setting up stack frames, reserving stack space for local variables, destroying stack frames and returning to the caller.

A typical prologue for a function with no local variables when using a frame pointer may look like:

push ebp
mov ebp, esp

This saves the caller's frame pointer on the stack, and makes the current frame pointer equal to the stack pointer at function entry time.

The corresponding epilogue would be:

pop ebp
ret

which restores the previous frame pointer and returns to the caller.

If you tell gcc not to use frame pointers (-fomit-frame-pointer), the corresponding prologue will be empty, and the epilogue will just contain a ret.

That __declspec(naked) is probably similar to gcc's __attribute__((naked)) (gcc's function attributes), which only works for some architectures, and not on x86. So, on gcc, you better leave returning to the caller to the compiler, as Dean Pucsek advised you.

When you say prologue and epilogue, are you talking about something that happens at the beginning and end of a function that the compiler automatically does, or something that a programmer usually does? Or am I way off? — styfle, Apr 26 '11 at 22:00

Peter Cordes · Answer 3 · 2022-05-09T14:24:32.310

Don't ret from inside an asm block or statement, except with __attribute__((naked)).
Don't make assumptions about [ebp+x] holding certain C variables in non-naked inline asm.

@Dean Pucsek's answer (using [ebp+8], 12, 16 in a non-naked function) may happen to work with optimization disabled, but it breaks spectacularly¹ in a normal -O2 build. (doStuff inlines into its caller, which may have different args at [ebp + 8], 12, and 16). Unless you happen to have it in a separate compilation unit from any caller, and don't use -flto.

You have three options, other than not using inline asm for such a trivial thing in the first place:

Remove the ret and change it to use named C variables for the args, instead of making assumptions about this function not inlining. (You can also use __attribute__((noinline)) if you want it not to inline for some reason, but there's no benefit to hard-coding the calling convention, and no need if not in a naked function).
Move the asm to a separate .s file, and just declare a prototype in C or C++.
Use __attribute__((naked)) which is now supported for x86 by clang. (And GCC, but mainline GCC itself doesn't support -fasm-blocks. You're using an Apple version of GCC, or actually Clang installed as gcc for shell scripts / Makefiles like current MacOS does.)

Converting it to actual inline asm without naked or noinline will let it inline (unlike options 2 or 3) so it's a bit less inefficient, but it's still an asm block, not a GNU C asm statement like this (see the Godbolt link below for this in action inside a function)

    asm("and %[flags], %[outval]"                    // AT&T syntax:  op src, dst.  clang always parses this way, gcc with -masm=intel treats inline asm as Intel-syntax
      : [outval]"=r"(*(unsigned char(*)[4])result)   // 4-element uchar array.  Normally type-pun deref is strict-aliasing UB, but GCC documents this for asm.
      : "0"(val) /*pick same register as output 0*/, [flags]"r" (flags)  // reg, mem, or immediate source
      : // no clobbers
    );

It's a bit weird that your current asm stores 4 bytes to an unsigned char* output (which is why I couldn't just use "=r"(*result)), but I guess it's a char array, or actually pointing to an unaligned dword somewhere? I preserved that instead of telling the compiler we only want the low byte of the output.

That would let the compiler have both inputs in registers and handle the mov instructions for you. See https://stackoverflow.com/tags/inline-assembly/info. But of course it's still an opaque inline asm statement that the optimizer can't see through or anything, so https://gcc.gnu.org/wiki/DontUseInlineAsm if you can possibly avoid it.

With an "rmi" or "r,m,i" constraint, actual GCC would be able to smartly choose an immediate or memory, but clang is dumb about this and always picks memory. Or for x,y,z always picks the first option, which is why I put register first.

You wouldn't have this problem if using val &= flags; memcpy(result, &val, sizeof(val)); - both GCC and clang could optimally do the and with minimal wasted mov instructions.

Still using MSVC-style asm-blocks

If you really want inefficient MSVC-style asm-blocks stuff that forces the compiler to have the inputs and outputs in memory, not registers, use

// without __attribute__((naked))
void
doStuff(unsigned long int val, unsigned long int flags, unsigned char *result)
{
  __asm         // don't push/pop inside the asm block: the compiler still sees all touched registers as clobbered and saves itself if necessary
    {
        mov eax, [val]      // compiler will fill in [esp+x] or [ebp+y] or whatever for C var names
        and eax, [flags]    // memory-source AND is fine
        mov ecx, [result]   // load the pointer variable
        mov [ecx], eax    // deref it, storing 4 bytes to result[0..3]
      }
      // for non-void functions: beware MSVC supports falling off the end without a return statement, with a value left in EAX by an asm{} block.  (Even respecting that EAX result after inlining this function into another).
      // clang -fasm-blocks doesn't: it's undefined behaviour in C++, or in C if the caller uses it.
}

Godbolt with clang14.0 -O3 -m32 -fasm-blocks -Wall -fno-pie - the stand-alone version is about as efficient as it could be, given the clunky stack-args calling convention and the asm{} block which doesn't let it do any better.

doStuff(unsigned long, unsigned long, unsigned char*):
// start of inline asm
        mov     eax, dword ptr [esp + 4]
        and     eax, dword ptr [esp + 8]
        mov     ecx, dword ptr [esp + 12]
        mov     dword ptr [ecx], eax
// end of inline asm
        ret

A test caller shows that it inlines safely (but inefficiently as you'd expect from an asm{} block):

// test caller
unsigned char global_charbuf[4];

void foo(unsigned long int flags, unsigned char *result) {
   // result unused, unless you edit to pass it instead
    doStuff(1234, flags, global_charbuf);  // safe after inlining: stores 1234 to the stack
}

foo(unsigned long, unsigned char*):
        sub     esp, 12                    // space for uchar *result local var
        mov     eax, dword ptr [esp + 16]      // foo's flags arg
        mov     dword ptr [esp + 8], 1234
        mov     dword ptr [esp + 4], eax       // This copy seems unnecessary; asm should be able to simply reference foo's stack arg
        mov     dword ptr [esp], offset global_charbuf   // It's not preserving their relative order.

// start of inline asm
        mov     eax, dword ptr [esp + 8]
        and     eax, dword ptr [esp + 4]
        mov     ecx, dword ptr [esp]
        mov     dword ptr [ecx], eax
// end of inline asm

        add     esp, 12
        ret

The same test caller passing the same args to a wrapper function using GNU C asm("..." :outputs :inputs :clobbers) compiles to not perfect but much less horrible asm. The the Godbolt link for source for that.

bar(unsigned int, unsigned char*)
        mov     eax, dword ptr [esp + 4]   # compiler-generated loads of the "r" register inputs
        mov     ecx, 1234                  # clang is incapable of getting inline asm to use an  AND ecx, 1234  unless we *only* allow an immediate source.  Or maybe use __builtin_constant_p() around multiple separate blocks.
// start of asm
        and     ecx, eax                    # =r output picked EAX, "r" input picked ECX
// end of asm
        mov     dword ptr [global_charbuf], ecx     # Compiler-generated store of the "=r" output
        ret

You shouldn't manually push/pop registers inside an asm{} block, with either MSVC or clang -fasm-blocks. They both parse your asm and figure out which registers are actually written, and treat the block as clobbering those registers. (And if necessary, have the containing function save/restore any of those registers that are call-preserved in the calling convention, in whatever function this eventually inlines into.)

Apparently some really early C++ implementations, like I think Borland Turbo C++, did not parse your asm, so you did need to push/pop manually to avoid stepping on the compiler's toes. But that sounds even more clunky and inefficient; fortunately those days are long passed.

Footnote 1: broken asm if you get this wrong

// FIXME: use __attribute__((naked)) and put the RET back in, with [ESP+x] addressing

/*__declspec(naked)*/ void
broken_doStuff(unsigned long int val, unsigned long int flags, unsigned char *result)
{
  __asm
    {
        push ebx               // keep one unnecessary push, just the one you'd need in an actual naked function to not violate the calling convention.  EAX and ECX are call-clobbered.  So is EDX, but not EBX.

        mov eax, dword ptr[ebp + 8]  //val
        mov ebx, dword ptr[ebp + 12] //flags
        mov ecx, dword ptr[ebp + 16] //result

        and eax, ebx
        mov [ecx], eax

        pop ebx
      }
}

void foo(unsigned long int flags, unsigned char *result) {
    doStuff(1234, flags, result);
  // broken: inlines the asm that assumes 3 args on the stack
}

Compiled on Godbolt with clang14.0 -m32 -O3 -fasm-blocks. The stand-alone definition of doStuff is fine; inefficient but works. The problem is when it inlines into foo:

foo(unsigned long, unsigned char*):
        push    ebp               // a push or pop in the inline asm makes clang use EBP as a frame pointer
        mov     ebp, esp
        push    ebx               // generated by clang, since asm writes this call-preserved reg
  // no  mov dst, 1234   anywhere:
  // clang doesn't see doStuff using its "val" arg

// Start of inline asm
        push    ebx
        mov     eax, dword ptr [ebp + 8]   // wants val, actually loads foo's first arg, flags
        mov     ebx, dword ptr [ebp + 12]
        mov     ecx, dword ptr [ebp + 16]  // these also access the wrong things
        and     eax, ebx
        mov     dword ptr [ecx], eax
        pop     ebx
// end of inline asm

        pop     ebx                 // compiler-generated epilogue
        pop     ebp                 // not leave, it assumes inline asm balanced the stack, and knows it didn't allocate any stack space itself.
        ret

This will normally crash when it tries to store using a unsigned char *result that it loaded garbage for, from somewhere in the caller's stack frame above foo's stack args.

I left in the push/pop to demonstrate that the compiler already sees EBX being written and saves/restores it. If you take that out, it also doesn't set up EBP as a frame pointer, since it knows the stack isn't moving inside the asm statement so things like val can expand to [esp+y] instead of [ebp+x].

This also doesn't respect -mregparm=3 to use a register-arg calling convention. A regular asm{ mov eax, [val] } would, although the compiler would still just spill val to memory because the whole semantics of asm{} blocks are designed around all inputs being in memory, no registers occupied with inputs from the start.

Bus error: Inline x86 assembly with GCC asm{} block on Mac OS X, converting from MSVC naked function

3 Answers3

Still using MSVC-style asm-blocks

Footnote 1: broken asm if you get this wrong