6

I have a function which prints text and a floating point number. Here is a version which does not use main

extern printf
extern _exit

section .data
    hello:     db 'Hello world! %f',10,0
    pi:        dq  3.14159
section .text
    global _start
_start:
    xor eax, eax
    lea rdi, [rel hello]
    movsd xmm0, [rel pi]
    mov eax, 1
    call printf
    mov rax, 0
    jmp _exit

I assemble and link this like this

nasm -felf64 hello.asm
ld hello.o -dynamic-linker /lib64/ld-linux-x86-64.so.2 -lc -melf_x86_64

This runs fine. However, now I want to do this using main.

global main
extern printf

section .data
    hello:     db 'Hello world! %f',10,0
    pi:        dq  3.14159
section .text
    main:
    sub rsp, 8
    xor eax, eax
    lea rdi, [rel hello]
    movsd xmm0, [rel pi]
    mov eax, 1
    call printf
    mov rax, 0
    add rsp, 8
    ret

I assembly and link like this

nasm -felf64 hello_main.asm
gcc hello_main.o

This runs fine as well. However, I had to subtract eight bytes from the stack pointer before calling printf and then add eight bytes to the stack pointer after otherwise I get a segmentation fault.

Looking at the stack pointer I see that without using main it's 16-byte aligned but with main it's only eight byte aligned. The fact that eight bytes has to be subtracted and added says that it's always 8-byte aligned and never 16-byte aligned (unless I misunderstand something). Why is this? I thought with x86_64 code we could assume that the stack is 16-byte aligned (at least for standard library function calls which I would think includes main).

Z boson
  • 32,619
  • 11
  • 123
  • 226

1 Answers1

10

According to the ABI, the stack pointer + 8 should be kept 16 byte aligned upon entry to functions. The reason you have to subtract 8 is that call itself places 8 bytes of return address on the stack, thereby violating this constraint. Basically you have to make sure the total stack pointer movement is a multiple of 16, including the return address. Thus the stack pointer needs to be moved by multiple of 16 + 8 to leave room for the return address.

As for _start, I don't think you can rely on it working without manual alignment either. It just so happens that in your case it works due to the things already on the stack.

Jester
  • 56,577
  • 4
  • 81
  • 125
  • I checked out the [abi](http://www.x86-64.org/documentation/abi.pdf). It says "the value(%rsp+ 8)is always a multiple of 16 (32) when control is transferred to the function entry point. So that explains why main is 16+8 aligned. But then why is `_start` not 16+8 aligned? – Z boson Nov 11 '14 at 16:05
  • 3
    Actually section `3.4.1` says `rsp: it is guaranteed to be 16-byte aligned at process entry`. – Jester Nov 11 '14 at 16:17
  • Oh, good observation, you mean for `_start`. Okay, I need to do some reading and testing now. This is the disadvantage of starting with amd64 assembly because you can do a lot without using the stack. If I had started with 32-bit mode I would be familiar with the stack already. – Z boson Nov 11 '14 at 19:29