2

I am trying to print an array, reverse it, and then print it again. I manage to print it once. I can also make 2 consecutive calls to _printy and it works. But the code breaks with the _reverse function. It does not segfault, it exits with code 24 (I looked online but this seems to mean that the maximum number of file descriptors has been exceeded, and I cannot get what this means in this context). I stepped with a debugger and the loop logic seems to make sense.

I am not passing the array in RDI, because _printy restores the content of that register when it exits. I also tried to load it directly into RDI before calling _reverse but that does not solve the problem.
I cannot figure out what the problem is. Any idea?

BITS 64
DEFAULT REL

; ------------------------------------- 
; ------------------------------------- 
;             PRINT LIST 
; -------------------------------------
; -------------------------------------

%define SYS_WRITE               0x02000004
%define SYS_EXIT                0x02000001
%define SYS_OPEN                0x02000005
%define SYS_CLOSE               0x02000006
%define SYS_READ                0x02000003

%define EXIT_SUCCESS        0
%define STDOUT                  1

%define LF                          10
%define INT_OFFSET      48

section .text
    extern _printf
    extern _puts
    extern _exit

    global _main

_main:
    push rbp
    lea rdi, [rel array]

    call _printy
    call _reverse
    call _printy

    pop rbp 
    call _exit

_reverse:
    push rbp
    lea rsi, [rdi + 4 * (length - 1) ]
    
    .LOOP2:
        cmp rdi, rsi
        jge .DONE2
        
        mov r8, [rdi]
        mov r9, [rsi]

        mov [rdi], r9 
        mov [rsi], r8   

        add rdi,4 
        sub rsi,4 

        jmp .LOOP2

    .DONE2:
        xor rax, rax
        lea rdi, [rel array]
        pop rbp
        ret
        

_printy:
    push rbp

    xor rcx, rcx
    mov r8, rdi

    .loop:
        cmp rcx, length
        jge .done
        
        push rcx
        push r8

        lea rdi, [rel msg]
        mov rsi, [r8 + rcx * 4]
        xor rax, rax
        call _printf
        
        pop r8
        pop rcx

        add rcx, 1
        jmp .loop

    .done: 
        xor rax, rax
        lea rdi, [rel array]
        pop rbp
        ret


section .data
    array: dd 78, 2, 3, 4, 5, 6
    length: equ ($ - array) / 4
    msg: db "%d => ", 0

Edit with some info from the debugger

Stepping into the _printy function gives the following msg, once reaching the call to _printf.

* thread #1, queue = 'com.apple.main-thread', stop reason = step over failed (Could not create return address breakpoint.)
    frame #0: 0x0000000100003f8e a.out`printf
a.out`printf:
->  0x100003f8e <+0>: jmp    qword ptr [rip + 0x4074]  ; (void *)0x00007ff80258ef0b: printf
    0x100003f94:      lea    r11, [rip + 0x4075]       ; _dyld_private
    0x100003f9b:      push   r11
    0x100003f9d:      jmp    qword ptr [rip + 0x5d]    ; (void *)0x00007ff843eeb520: dyld_stub_binder

I am not an expert, but a quick research online led to the following

During the 'thread step-out' command, check that the memory we are about to place a breakpoint in is executable. Previously, if the current function had a nonstandard stack layout/ABI, and had a valid data pointer in the location where the return address is usually located, data corruption would occur when the breakpoint was written. This could lead to an incorrectly reported crash or silent corruption of the program's state. Now, if the above check fails, the command safely aborts.

So after all this might not be a problem (I am also able to track the execution of the printf call). But this is really the only understandable piece of information I am able to extract from the debugger. Deep in some quite obscure (to me) function calls I reach this

* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
    frame #0: 0x00007ff80256db7f libsystem_c.dylib`flockfile + 10
libsystem_c.dylib`flockfile:
->  0x7ff80256db7f <+10>: call   0x7ff8025dd480            ; symbol stub for: __error
    0x7ff80256db84 <+15>: mov    r14d, dword ptr [rax]
    0x7ff80256db87 <+18>: mov    rdi, qword ptr [rbx + 0x68]
    0x7ff80256db8b <+22>: add    rdi, 0x8
Target 0: (a.out) stopped.
(lldb) 
Process 61913 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
    frame #0: 0x00007ff8025dd480 libsystem_c.dylib`__error

This is one of the function calls happening in _printf.
Ask further questions if there is something more I can do.

Sep Roland
  • 33,889
  • 7
  • 43
  • 76
great coconut
  • 63
  • 2
  • 6
  • This is MacOS I assume, given the syscall numbers and leading underscores on the function names? BTW, `div` a *very* inefficient way to divide by 2. Use a shift, or since it's an assemble-time constant, `mov eax, length/2`. Also, you're loading / storing 8-byte chunks; `r8` is a qword register, but your elements are dwords. (Also, `xchg` with mem is *very* slow, it's an atomic RMW.) – Peter Cordes Jun 27 '22 at 02:16
  • Normally you'd just start with pointers to the start and end, and walk them in until they cross, instead of doing any calculation of a loop iteration count before running the loop. So 2 mov loads, 2 mov stores. – Peter Cordes Jun 27 '22 at 02:20
  • hi, thank you for the answer! I corrected the code for the reverse function the last bit would just be to store 4 bytes instead of 8. However, stepping again with the debugger, I get an error in the printy function (BTW, yes, I am on MacOs): `x100003f8e <+0>: jmp qword ptr [rip + 0x4074] ; (void *)0x00007ff80258ef0b: printf`. I do not understand what this means. I am saving the registers before running printf, respecting the ABI (rdi, rsi), zeroing al, and then popping the registers that I need. Maybe, also the _exit function could cause some problems (sometimes I can’t print using syscall – great coconut Jun 27 '22 at 05:06
  • I guess printf needs 16 bits alignment but isn’t it aligned after the two pushes anyway? – great coconut Jun 27 '22 at 05:07
  • the first print, prints correctly, then I reverse, but the second print fails – great coconut Jun 27 '22 at 05:11
  • A `jmp` instruction is faulting? That doesn't depend on the stack pointer. That's probably a dynamic linker PLT stub that's just using the function pointer from the GOT entry. It shouldn't crash unless you've overwritten memory somewhere, or somehow built an executable that skips dynamic linking so it doesn't get filled in. I don't know how dynamic linking works on MacOS so IDK if that's a plausible problem. I'd suggest checking the value in memory there, since `0x00007ff80258ef0b` looks like a plausible address. – Peter Cordes Jun 27 '22 at 05:11
  • BTW, the calling convention does not require saving/restoring any arg-passing registers. Your `printty` function can return with RDI modified. Only RBP, RBX, and R12-R15 are call-preserved. So you should save/restore two of those registers before/after your print loop, and use them for your loop variables. (And `main` should redo the LEA to pass an arg to `reverse` and again for the next `print`). But what you're doing now (push/pop inside the loop) isn't wrong, just inefficient. And yeah, each function has an odd number of pushes before a call, so 16 **byte** RSP alignment is maintained. – Peter Cordes Jun 27 '22 at 05:13
  • 1
    BTW, it would be better style to have your functions take either 2 pointers or a pointer+length. If you pass a pointer but the functions are using the constant `length` that goes with `array`, that kind of defeats the purpose. Especially having `reverse` do `lea rdi, [rel array]` at the *end* seriously defeats the purpose, so as I said, only main should know about `array`. As for your crashes, I still don't know why that would be happening. It would help to show more context from the debugger, and find out whether it's the *first* `_printf` call in the 2nd call to `printy`, or what. – Peter Cordes Jun 27 '22 at 05:17
  • thanks for you patience, I checked the address in memory, and it’s a garbage value, i will edit the question with info coming from the debugger – great coconut Jun 27 '22 at 05:25
  • I'd suggest setting a watchpoint on it, then, since your first printf works. Your debugger can show you when it's modified again. IDK what could be doing it, though; your `_reverse` is doing overlapping 8-byte loads and stores which might not be correct for `array`, but the 4 bytes after the end of the storage for array is the `msg` format string, so that can't directly be stepping on the GOT pointer. It would be overwriting the `%d` part of the msg string, so I wouldn't expect any output (except some binary byte repeated n times, one per printf), but I also wouldn't expect a crash. – Peter Cordes Jun 27 '22 at 05:31
  • yeah, I’ll try thank you for all the help, especially on passing arguments to a function. That’s what I would have done in C but since I am a beginner in assembly I just focus on registers, pushes and jumps, forgetting how to code. BTW how can I solve the push/pop business in the printy function: how can I save the loop counter (rcx) without getting it modified by _printf (and by passing the array pointer to the function into say rsi I still have to save it, so that it does not get modified by printf again, correct me if I am wrong). – great coconut Jun 27 '22 at 05:56
  • Don't use RCX as a counter, use RBX instead. (And R12 or something if you need another loop variable). That's the whole point of having [call-preserved registers](https://stackoverflow.com/questions/9268586/what-are-callee-and-caller-saved-registers/56178078#56178078) in a calling convention, so you can use them for loop variables in loops that make function calls. (Or across multiple function calls in a straight line). Your loop should need 2 call-preserved registers: a pointer and an end-pointer (`do{ print *p; }while(++p != endp)`). Or 3 if you use a base, index, and end-index. – Peter Cordes Jun 27 '22 at 06:00
  • yeah, you are right, that’s definitely the purpose of call-preserved registers. Thanks again for all the answers. – great coconut Jun 27 '22 at 06:06

1 Answers1

1

Your array consists of int32 numbers aka dd in nasm terminology, but your swap operates on 64 bit numbers:

    mov r8, [rdi]
    mov r9, [rsi]

    mov [rdi], r9 
    mov [rsi], r8 

Assuming you were not after some crazy optimizations where you swap a pair of elements simultaneously you want this to remain in 32 bits:

    mov r8d, [rdi]
    mov r9d, [rsi]

    mov [rdi], r9d 
    mov [rsi], r8d 
Kamil.S
  • 5,205
  • 2
  • 22
  • 51