1

I am trying to use printf to print the first character of the string str_1 to stdout in x86-assembly within a 64bit Ubuntu 20 environment, here is my attempt:

; nasm -f test.asm && gcc -m32 -o test test.asm.o
section .text
global  main
extern printf

some_proc:
    mov esi, str_1

    mov eax, [esi]
    push eax
    push argv_str
    call printf

    pop eax
    ret

main:
    call some_proc

    ret

section  .data
    str_1        db `three`
    argv_str     db `%c\n`

This outputs:

t
Segmentation fault (core dumped)

Expected stdout:

t

Why is this code resulting in segmentation fault and how do I modify the code to output the expected stdout?

dnsis_445
  • 45
  • 3
  • 1
    Have you tried using a debugger to see where the fault actually occurs? – Thomas Jager Nov 06 '20 at 16:12
  • By having your entry point be `_start`, you're bypassing all of the standard library's initialization code, and so you can't expect any standard library functions such as `printf` to work properly. This is only appropriate for programs that don't need the C library at all, and will do all of their work through raw system calls. If you need the C library then you need to have your program's entry point be `main`. – Nate Eldredge Nov 06 '20 at 16:25
  • You have another bug in that `mov eax, [esi]` loads 4 bytes when you only want 1. – Nate Eldredge Nov 06 '20 at 16:26
  • Oh, but the bug that probably actually causes your crash is that you push arguments on the stack for printf, and it's your responsibility to pop them back off, but you don't. – Nate Eldredge Nov 06 '20 at 16:27
  • How do I only load a single byte? Ever after changing the code's entry point to `main` and compiling with `gcc` I still receive a segmentation fault. – dnsis_445 Nov 06 '20 at 16:49
  • @NateEldredge I modified my example to represent what you recommended. Still I receive the same stdout. – dnsis_445 Nov 06 '20 at 16:51
  • You pushed twice but popped only once. In order for `ret` to return to the right place, the stack needs to be back in exactly the same state as when the function was entered. The call to `printf` does likewise; unlike in some other calling conventions, it doesn't remove its arguments from the stack. – Nate Eldredge Nov 06 '20 at 20:00

1 Answers1

2

You have several bugs:

  • You push two 4-byte arguments onto the stack for printf. In SysV calling conventions, printf will leave them there, and so it is your responsibility to adjust the stack afterwards to "remove" them. Remember that ret will look for a return address at the top of the stack; as your code stands, what will be there is the character value from eax that you pushed. That's not a valid address, so trying to return there causes a segfault. You can remove those arguments by popping twice, or more efficiently by simply adding 8 to esp, thus moving the stack pointer back to where it was.

  • Current versions of the i386 SysV ABI require the stack to be aligned to 16 bytes just before calling any function. Thinking about the fact that call itself pushes 4 bytes on the stack as the return address, as does every push instruction, you can work out the necessary adjustments needed for your calls to some_proc and to printf, and add or subtract from esp as appropriate. (Technically you could avoid aligning the stack before calling some_proc and just fix it up before printf, but this is too easy to screw up.) Some 32-bit libraries may be compiled in such a way that this requirement is not enforced, but 64-bit code definitely needs it, so it is a good habit to comply.

  • esi is a callee-saved register according to i386 SysV ABI calling conventions (memorize these!). If you want to modify it, you have to save the previous contents and restore them before returning (e.g. push esi at the top of the function and pop esi at the end). Or choose a caller-saved register such as ecx instead. However, as noted below, you don't really need to use a register for the address of str1 at all.

  • mov eax, [esi] is a 32-bit load because eax is a 32-bit register. So this will load eax with the 4 bytes from location str_1, which will result in it containing the value 0x65726874 (the bytes t h r e as a little-endian integer). This may not actually cause a problem since printf is supposed to convert its int argument back to unsigned char for printing, so you should only get the low byte 0x74 = 't', but it is still weird, and could break if your string was very short and adjacent to an unmapped page.

    Safer would be mov al, [esi] which just loads one byte into al, which is the low byte of eax, but whatever garbage is in the high 3 bytes will stay there. You could zero out eax beforehand with xor eax, eax, but you can also kill two birds with one stone with the movzx instruction, which zero-extends a smaller operand into a larger one: movzx eax, byte [esi].

    Of course, putting the address into esi first is redundant, since the address can be specified as an immediate: mov al, [str_1] or movzx eax, byte [str_1]. This then avoids the need to save/restore esi.

  • main is expected to return an exit code, and return values always go in eax. Your eax would contain your characters or maybe the return value from printf, depending where your push/pops end up. Any of those will be a weird nonzero exit code and your shell will think the program encountered an error. So zero out eax before returning from main, to indicate success.

  • argv_str is a strange name for a string that has nothing to do with argv.

I would modify your program as follows:

; nasm -f test.asm && gcc -m32 -o test test.asm.o
section .text
global  main
extern printf

some_proc:
    sub esp, 4 ; 8 more bytes pushed before call to printf
    movzx eax, byte [str_1]
    push eax
    push argv_str
    call printf
    add esp, 12
    ret

main:
    sub esp, 12
    call some_proc
    xor eax, eax
    add esp, 12
    ret

section  .data
    str_1        db `three`
    argv_str     db `%c\n`

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82
  • How does one know exactly how much you should increment/decrement the stack before calling functions/procedures? – dnsis_445 Nov 07 '20 at 17:31
  • @dnsis_445: Basically, you think about how to get it to be a multiple of 16, based on what other changes you have made to the stack pointer. See https://stackoverflow.com/a/64729675/634919 and dozens of other questions on this site about stack alignment. – Nate Eldredge Nov 07 '20 at 17:59