0

This is the code I am playing with right now:

# file-name: test.s
# 64-bit GNU as source code.
    .global main

    .section .text
main:
    lea message, %rdi
    push %rdi
    call puts

    lea message, %rdi
    push %rdi
    call printf

    push $0
    call _exit

    .section .data
message: .asciz "Hello, World!"

Compilation instructions: gcc test.s -o test

Revision 1:

    .global main
    .section .text
main:
    lea message, %rdi
    call puts

    lea message, %rdi
    call printf

    mov $0, %rdi
    call _exit

    .section .data
message: .asciz "Hello, World!"

Final Revision (Works):

    .global main
    .section .text
main:
    lea message, %rdi
    call puts

    mov $0, %rax
    lea message, %rdi
    call printf

    # flush stdout buffer.
    mov $0, %rdi
    call fflush

    # put newline to offset PS1 prompt when the program ends.  
    # - ironically, doing this makes the flush above redundant and can be removed.
    # - The call to  fflush is retained for display and 
    #      to keep the block self contained.  
    mov $'\n', %rdi
    call putchar

    mov $0, %rdi
    call _exit

    .section .data
message: .asciz "Hello, World!"

I am struggling to understand why the call to puts succeeds but the call to printf results in a Segmentation fault.

Can somebody explain this behavior and how printf is intended to be called?

Thanks ahead of time.


Summary:

  1. printf obtains the printing string from %rdi and the number of additional arguments in %rax's lower DWORD.
  2. printf results cannot be seen until a newline is put into stdout, or fflush(0) is called.
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Dmytro
  • 5,068
  • 4
  • 39
  • 50
  • 2
    If you're on Linux then when calling `printf` (and other functions that take a variable number of arguments) you need to load AL with the number of arguments you're passing in XMM registers. In this case the number would be 0. – Ross Ridge Apr 14 '16 at 20:29
  • what is the AL register on 64 bit system? – Dmytro Apr 14 '16 at 20:38
  • It's the lower 8-bits of the RAX register. You can refer to it by `%al`, eg. `mov $0, %al` though `xor %eax,%eax` is probably preferable in this case. – Ross Ridge Apr 14 '16 at 20:53
  • I figured it out, I was a bit confused by passing 0 to %rax felt nothing happened but, but I remembered that I should have tried flushing it after printing. Updated OP with current solution. – Dmytro Apr 14 '16 at 20:54
  • I updated my answer with some improvements and comments on your final version – Peter Cordes Apr 15 '16 at 00:43

1 Answers1

5

puts appends a newline implicitly, and stdout is line-buffered (by default on terminals). So the text from printf may just be sitting there in the buffer. Your call to _exit(2) doesn't flush buffers, because it's the exit_group(2) system call, not the exit(3) library function. (See my version of your code below).

Your call to printf(3) is also not quite right, because you didn't zero %al before calling a var-args function with no FP arguments. (Good catch @RossRidge, I missed that). xor %eax,%eax is the best way to do that. %al will be non-zero (from puts()'s return value), which is presumably why printf segfaults. I tested on my system, and printf doesn't seem to mind when the stack is misaligned (which it is, since you pushed twice before calling it, unlike puts).


Also, you don't need any push instructions in that code. The first arg goes in %rdi. The first 6 integer args go in registers, the 7th and later go on the stack. You're also neglecting to pop the stack after the functions return, which only works because your function never tries to return after messing up the stack.

The ABI does require aligning the stack by 16B, and a push is one way to do that, which can actually be more efficient than sub $8, %rsp on recent Intel CPUs with a stack engine, and it takes fewer bytes. (See the x86-64 SysV ABI, and other links in the tag wiki).


Improved code:

.text
.global main
main:
    lea     message, %rdi     # or  mov $message, %edi  if you don't need the code to be position-independent: default code model has all labels in the low 2G, so you can use shorter 32bit instructions
    push    %rbx              # align the stack for another call
    mov     %rdi, %rbx        # save for later
    call   puts

    xor     %eax,%eax         # %al = 0 = number of FP args for var-args functions
    mov     %rbx, %rdi        # or mov %ebx, %edi  will normally be safe, since the pointer is known to be pointing to static storage, which will be in the low 2G
    call   printf

    # optionally putchar a '\n', or include it in the string you pass to printf

    #xor    %edi,%edi    # exit with 0 status
    #call  exit          # exit(3) does an fflush and other cleanup

    pop     %rbx         # restore caller's rbx, and restore the stack

    xor     %eax,%eax    # return 0
    ret

    .section .rodata     # constants should go in .rodata
message: .asciz "Hello, World!"

lea message, %rdi is cheap, and doing it twice is fewer instructions than the two mov instructions to make use of %rbx. But since we needed to adjust the stack by 8B to strictly follow the ABI's 16B-aligned guarantee, we might as well do it by saving a call-preserved register. mov reg,reg is very cheap and small, so taking advantage of the call-preserved reg is natural.

Using mov %edi, %ebx and stuff like that saves the REX prefix in the machine-code encoding. If you're not sure / don't understand why it's safe to only copy the low 32bits, zeroing the upper 32b, then use 64bit registers. Once you understand what's going on, you'll know when you can save machine-code bytes by using 32bit operand-size.

Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • by the way, why is xor used in this example? I see it used often but can't make sense of the reasoning behind it. Is it cheaper? – Dmytro Apr 15 '16 at 00:59
  • @Dmitry: I already made a change to the text which added a link about xor. – Peter Cordes Apr 15 '16 at 01:02
  • My bad. I'll memorize this idiom. Although now i'm tempted to replace all of my 0 assignments in C with ^=... I'm not sure how good of an idea that is. Thanks for the solution, it's a lot to take in. I'll take some time studying it. – Dmytro Apr 15 '16 at 01:09
  • @Dmitry: that's a terrible idea in C. It's optimal on x86, not in general. On most architectures, `reg = reg^reg` will have a false dependency on the old value. It makes your code less readable, and there's no way to do it without having the C source use the uninitialized value. ([Compilers don't like that](http://stackoverflow.com/questions/32408665/fastest-absolute-value-calculator-using-sse/32422471#comment53135971_32422471). note the typo in that comment, though: it should be `_mm_undefined_si128()`, not `_mm_uninitialized_si128()`.) – Peter Cordes Apr 15 '16 at 01:58
  • Anyway, it's just horrible for human-readability as well. It's the standard idiom in x86, not in C. **Compilers know about this and will do it correctly**. They will always emit `xor same,same` instead of `mov $0, %reg`, except at `-O0` when they don't check for that peephole optimization. – Peter Cordes Apr 15 '16 at 02:00
  • still an interesting thing to understand. I like being able to read back what compiler produces and understand its' optimizations, or lacks of. – Dmytro Apr 15 '16 at 03:40
  • @Dmitry: yes, for sure. You can learn a lot from understanding compiler output from fairly simple functions. It's a mistake to turn around and apply some of the techniques to the C source, though. In this case, one of the many facets of that mistake is that it's only how x86 does things, not optimal on e.g. ARM or MIPS. (on MIPS, you zero `r3` with `addiu r3, r0, #0`: add immediate-zero to the architectural zero-register, and store the result in `r3`. There is no `move` insn, just a pseudo-op mnemonic for it. It would be similarly silly to write `a = b+0` instead of `a = b` in C, though) – Peter Cordes Apr 15 '16 at 04:48
  • 1
    @Dmitry: Actually it's not quite true that you shouldn't apply what you learn to C. You can sometimes change the C to make it easier / possible for the compiler to use arch-specific peephole optimizations like `LEA`. More generally, though, you can structure code in ways that will work well in asm. If you know an input can't be zero, then you can use a `do { } while()` loop instead of a `while()` loop, so the compiler doesn't have to check to make sure the loop runs at least once. Hoisting loop invariants can help, and so can manual unrolling with manual intro/epilogue for odd numbers. – Peter Cordes Apr 15 '16 at 05:24
  • One more question: how do I call printf("%u", a_number_in_data_section); in assembly? I can't manage to do it, I seem to be misunderstanding the role of %eax here. – Dmytro Apr 15 '16 at 20:45
  • ok I looked at compiler output and found that rax was still 0 for the printf with 2 arguments(I thought I needed to pass 1). All I had to do was pass the string as rdi, the second argument as rsi, 0 as rax, and call printf. Thanks! – Dmytro Apr 15 '16 at 22:29
  • 1
    @Dmitry: yup. `%al` is the number of floating point args (passed in xmm registers) to a var-args function. Still zero since pointers and unsigned int are both integer args. – Peter Cordes Apr 15 '16 at 22:44
  • @Bulat: Try your edit again, I noticed a couple things after coming to see how this answer was doing after getting the upvote notification, and my edit overwrote yours. – Peter Cordes Sep 19 '16 at 06:49