1

I'm currently trying to run some assembly in macOS 10.13.6.

What I'm trying to achieve is get two inputs and subtracting the first one from the second one and print the result. So... I'm trying to do something like: print -input() + input() in python. I'm relying the input and printing to the libc, by linking with another C object file.

What I cannot understand is that when I change the stack reserving size from 8 to 4, the program crashes.

Below is the working assembly code:

.section __TEXT,__text
.globl _main
_main:
        push %ebp
        mov %esp, %ebp
        subl $8, %esp
        call _input
        neg %eax
        mov %eax, -4(%ebp)
        call _input
        add -4(%ebp), %eax
        push %eax
        call _print_int_nl
        add $4, %esp
        mov $0, %eax
        leave
        ret

And below is the code that does not work.

.section __TEXT,__text
.globl _main
_main:
        push %ebp
        mov %esp, %ebp
        subl $4, %esp
        call _input
        neg %eax
        mov %eax, -4(%ebp)
        call _input
        add -4(%ebp), %eax
        push %eax
        call _print_int_nl
        add $4, %esp
        mov $0, %eax
        leave
        ret

As you can see, there is no modification except the stack reservation size.

This is the C file I'm linking with:

#include <stdio.h>

void print_int_nl(int x) { printf("%d\n", x); }

int input() {
    printf("In input...\n");
    int i;
    scanf("%d", &i);
    return i;
}

I'm currently compiling/assembling/linking by:

$ clang -c -arch i386 runtime.c
$ as -arch i386 example.s -o example.c
$ ld example.o runtime.o -lc -arch i386
$ ./a.out
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Can you try reservation sizes of 16, 24, 32 ... and of 12, 20, 28 ...? On some library functions (here: `scanf` and `printf`) require the stack pointer to have an 8- or 16-byte alignment. I don't know about MacOS. – Martin Rosenau May 31 '19 at 16:37
  • 1
    Mac OS require 16 byte stack alignment. Please check this question: https://stackoverflow.com/questions/612443/why-does-the-mac-abi-require-16-byte-stack-alignment-for-x86-32 – Ville Krumlinde May 31 '19 at 18:56
  • Including the `push`, you're changing the total offset from 12 to 8, vs. ESP on entry to `main`. Since your caller pushed a return address, 12+4 = 16 realigning the stack. – Peter Cordes May 31 '19 at 21:23
  • @PeterCordes I actually searched about it before questing, but then why does the code that reserves 8 bytes run? –  Jun 01 '19 at 02:05
  • Because 8+4=12, getting you back to a 16-byte alignment ready for another call. Remember that you're doing a `push %ebp` which also modifies ESP by 4. I'll look for a better duplicate that explains that, instead of just why it's necessary. – Peter Cordes Jun 01 '19 at 06:43
  • There were surprisingly few 32-bit duplicates of this. Many for 64-bit code. I ended up editing a couple existing answers (to bugfix one, and to add this info to one of mine). – Peter Cordes Jun 01 '19 at 08:56
  • @PeterCordes May I ask why calling _print_int_nl also works, considering I pushed %eax to the stack?̊̈ (Maybe this is because of my misunderstanding of how parameters are passed to a function?̊̈) –  Jun 01 '19 at 09:07
  • Not all functions *actually* crash when you violate the ABI. In fact most don't. That only happens when the compiler-generated asm happens to use a `movaps` or `movdqa` on some stack memory that is supposed to be aligned, e.g. to copy 16 bytes at a time. x86-64 GNU/Linux `scanf` does in practice crash on a misaligned stack, but printf happens not to. 32-bit MacOS may be similar. (I didn't look at your code super carefully to figure out what alignment ESP would have. You can check with a debugger single-stepping: just look at the last hex digit of ESP before a `call` instruction.) – Peter Cordes Jun 01 '19 at 09:10
  • @PeterCordes Thanks for your explanation! Then, (when I write assembly) do I always need to track %esp and align the stack manually by `sub $n %esp` before I call a function?̊̈ Or, is there a saner way to do that? –  Jun 01 '19 at 09:39
  • You could just reserve enough stack space for locals + function args and use `mov %eax, 4(%esp)` or whatever, and mostly avoid `push`/`pop` for making function calls. That can be efficient. If you're writing asm for performance reasons, then work out which will result in the fewest total uops including stack-sync uops. Or you could get to a 16-byte boundary on function entry, and the before each call do the right amount of padding for the number of args. But that's typically less efficient. – Peter Cordes Jun 01 '19 at 09:46
  • Of course stack args in general are inefficient, which is why x86-64 System V passes the first 6 integer and first 8 FP args in registers, except for large structs too big for regs. And with 8-byte stack slots for push/pop, you're only ever 1 away from 16-byte alignment, so it's simpler to keep track of. (But not having to move RSP before most calls is what *really* makes 64-bit code simpler in this respect.) – Peter Cordes Jun 01 '19 at 09:48
  • @PeterCordes Thanks for your explanation! You’ve mentioned about the duplicates about x86_64; can you post some links for them?̊̈ I would like some info about x86_64 too. Thanks :-) –  Jun 01 '19 at 10:26
  • [How does \`sub rsp, 16\` aligns the stack on Mac OSX?](//stackoverflow.com/q/29334737) and [glibc scanf Segmentation faults when called from a function that doesn't align RSP](//stackoverflow.com/q/51070716) Google for `site:stackoverflow.com x86-64 "rsp" call stack alignment` if you want many many more. Also see https://stackoverflow.com/tags/x86/info for more links to docs. – Peter Cordes Jun 01 '19 at 10:32

0 Answers0