3

While I know it is best to use compiler intrinsics, and for that matter, printf_chk, and also to put data in .rodata sections, I'm looking at gaining a deeper understanding of assembly language and am interested in compact code. There is something about printf I don't understand. I know where to put the parameters, and I know how to use %al for varargs, but it appears to be requiring additional stack space that I cannot account for.

This short program

        .text
        .globl  main
main:
        movsd   value(%rip), %xmm0    # value to print
        movl    $format, %edi         # format string
        movl    $1, %eax              # one floating-point arg
        call    printf
        movl    $0, %eax              # return 0 from main
        ret
        .align 8
value:  .double 74.321 
format: .asciz "%g\n"

gives a segfault.

However, when I add additional stack space to the frame, it works fine:

        .text
        .globl  main
main:
        subq    $8, %rsp              # ADD SOME STACK SPACE TO FRAME (WHY?)
        movsd   value(%rip), %xmm0    # value to print
        movl    $format, %edi         # format string
        movl    $1, %eax              # one floating-point arg
        call    printf
        movl    $0, %eax              # return 0 from main
        addq    $8, %rsp              # REMOVE ADDED STACK SPACE
        ret
        .align 8
value:  .double 74.321 
format: .asciz "%g\n"

Could it be an alignment issue? (I get the same problem when value and format are in an .rodata section.)

Mysticial
  • 464,885
  • 45
  • 335
  • 332
Ray Toal
  • 86,166
  • 18
  • 182
  • 232

1 Answers1

4

The stack must be 16-byte aligned, according to the www.x86-64.org/documentation/abi.pdf and also Microsoft's http://msdn.microsoft.com/en-us/library/ms235286(v=vs.80).aspx

mattst88
  • 1,462
  • 13
  • 21
  • Accepted! Verified by changing the value from 8 to 16 and getting a segfault, then changing to 24 and not getting the segfault. It is interesting that this _only_ happens when I use printf with vector registers, though. I don't see any real distinction between my first example which works fine _without_ tweaking rsp, and the second one which requires the tweaking. What is it about the second version that gets the stack out of alignment while the first one does not? – Ray Toal Apr 28 '12 at 02:25
  • Ah, I remember now: the stack is 16-byte aligned at the beginning of main(). The call instruction pushed the 8-byte return address onto the stack, which misaligns it and causes you to need to subq some odd multiple of 8 bytes to realign it. Why a misaligned stack causes a seg fault only when using a vector register (a register! not the stack!) isn't entirely clear to me. Probably a lack of understanding of how varargs work. – mattst88 Apr 28 '12 at 05:42
  • In the x86-64 System V ABI (on Linux, like this looks like from using `.rodata`), `rsp` is 16-byte aligned *before* a `call`, so at the start of a normal function like `main`, `rsp-8` is aligned by 16, ready for another `call`. Variadic functions take advantage of this guarantee when `al>0` by dumping xmm0..7 to the stack with 16B-aligned stores. (Because `__m128` args can be passed to variadic functions, and the gcc doesn't optimize it's code-gen for variadic functions that never end up looking for an FP register arg wider than 8 bytes.) – Peter Cordes Apr 18 '18 at 20:52