0

I have got an assembly function strcmp, which I use with C. This is the function-

; strcmp-
;   takes rdi and rsi as the string
;   this function always returns 0 if strings were equal, else 1, stored in rax

BITS 64

section .text
    global strcmp
strcmp:
    push r15
    push r14
    mov r15, 0
    jmp strcmp_loop

strcmp_loop:
    mov r14, [rdi+r15]

    cmp r14, [rsi+r15] ; compare s1[i] and s2[i]
    jne exit_fail

    cmp r14, 0 ; compare s1[i] with NULL. No need to do the same with s2[i] since s1[i] and s2[i] will be equal at this point
    je exit_success

    inc r15
    jmp strcmp_loop

exit_success:
    mov rax, 0
    jmp exit

exit_fail:
    mov rax, 1
    jmp exit

exit:
    pop r14
    pop r15
    ret

And io_funcs.c-

#include "io_funcs.h"

int main() {
    char s[] = "Hello World!\n", test[14];

    puts(s);
    strcpy(test, s);
    puts(test);

    if(!strcmp(s, test)) {
        puts("Equal!!!! :-)");
    } else {
        puts("Unequal!!! :-(");
    }

    putc('\n');

    return 0;
}

which I compile with gcc o/* c/io_funcs.c -static -nostdlib (I assemble my assembly programs in directory o/ using nasm). When I run it, I get-

$ ./a.out 
Hello World!
Hello World!
Unequal!!! :-(

When I try debugging it a bit, I get-

$ gdb ./a.out -q
Reading symbols from ./a.out...
(gdb) b strcmp
Breakpoint 1 at 0x401120: file asm/strcmp.asm, line 10.
(gdb) run
Starting program: /home/spot/coding/asm/io_funcs/a.out 
Hello World!
Hello World!

Breakpoint 1, strcmp () at asm/strcmp.asm:10
10          push r15
(gdb) p (char *)$rdi
$1 = 0x7fffffffe16a "\210\341\377\377\377\177"
(gdb) p (char *)$rsi
$2 = 0x7fffffffe15c "Hello World!\n"
(gdb) p (char *)$rdx
$3 = 0x7fffffffe15c "Hello World!\n"
(gdb)

This seems to mean that rsi and rdx contain the arguments, and rdi maybe just garbage (or something else?).

When I replace this line

    if(!strcmp(s, test)) {

with this

    if(!strcmp("Hello World!\n", test)) {

the program seems to give me this-

$ gdb ./a.out -q
Reading symbols from ./a.out...
(gdb) b strcmp
Breakpoint 1 at 0x401120: file asm/strcmp.asm, line 10.
(gdb) run
Starting program: /home/spot/coding/asm/io_funcs/a.out 
Hello World!
Hello World!

Breakpoint 1, strcmp () at asm/strcmp.asm:10
10          push r15
(gdb) p (char *)$rdi
$1 = 0x402000 "Hello World!\n"
(gdb) p (char *)$rsi
$2 = 0x7fffffffe15c "Hello World!\n"
(gdb)

but also gives the output Unequal!!! :-(.

But when I replace the similar line with

    if(!strcmp("Hello World!\n", "Hello World!\n")) {

the program seems to give me correct output-

$ ./a.out 
Hello World!
Hello World!
Equal!!!! :-)

Why does this happen? And how can I solve it?

Also note that all the functions that I am using, have been implemented by me, by my own in assembly (just for a practice of assembly). And they all seem to work except for the strcmp function.

Edit- After following the advice of checking my strings after strcpy execution in the comments, I see that string s has been changed.

$ gdb ./a.out -q
Reading symbols from ./a.out...
(gdb) b 8
Breakpoint 1 at 0x40105f: file c/io_funcs.c, line 8.
(gdb) run
Starting program: /home/spot/coding/asm/io_funcs/a.out 
Hello World!

Breakpoint 1, main () at c/io_funcs.c:8
8           puts(test);
(gdb) p (char *)test
$1 = 0x7fffffffe15c "Hello World!\n"
(gdb) p (char *)s
$2 = 0x7fffffffe16a "\210\341\377\377\377\177"
(gdb)

(Note that strcpy executes at line 7, so I am checking this right after the function call.)

So, I might have done a mistake in my strcpy function. Please check for any errors that you might see in it- strcpy.asm-

; strcpy-
;   takes rdi and rsi as the string
;   this function always returns 0, stored in rax

BITS 64

section .text
    global strcpy
strcpy:
    push r15
    push r14
    mov r15, 0
    jmp strcpy_loop

strcpy_loop:
    mov r14, [rsi+r15]
    mov [rdi+r15], r14

    cmp [rsi+r15], byte 0
    je exit

    inc r15
    jmp strcpy_loop

exit:
    mov rax, 0
    pop r14
    pop r15
    ret
L_R
  • 170
  • 11
  • It's either `putc('\n', stdout)` or `putchar('\n')`. – xiver77 Jun 20 '22 at 05:25
  • I cannot reproduce the problem. You can see that the compiler (GCC 12.1) correctly assigns the arguments to `rdi` and `rsi` in order (https://godbolt.org/z/qW8YcPdoM). – xiver77 Jun 20 '22 at 05:26
  • `Unequal!!! :-(` is expected from a function that compares in qword chunks, and doesn't stop until it sees a whole 8-byte qword that's `0`. Your `0x7fffffffe16a "\210\341\377\377\377\177"` is evidence that your `strcpy` does *not* work correctly either. RDX just has a 2nd copy of RSI, from whatever the compiler was doing before the call. Note that it's the same *address* as well as the same contents interpreted as a C string (stopping at the first zero *byte*, not a whole qword) in RSI and RDX, not two different args. – Peter Cordes Jun 20 '22 at 05:27
  • 2
    Use GDB to examine the C variable `test` right after your strcpy returns, and/or step into it and single-step it. BTW, you're looking at arg *passing*, not what's left in registers when a function *returns*. (Functions are allowed to leave any garbage they want in the arg-passing registers, and R10 and R11. Only RBX, RBP, and R12-R15 are call-preserved.) – Peter Cordes Jun 20 '22 at 05:30
  • @PeterCordes I use puts to check if test is correct or not. And you can see the result yourself (I have posted it. Both `s` and `test` print similar string, which is `Hello World!\n` – L_R Jun 20 '22 at 08:43
  • @PeterCordes Comparing qword by qword two ASCIIZ byte strings leads to undefined behavior near the NULL bytes if the "end" part is not handled in a "byte" way. It's probably the main reason for the observed behavior, why did you not directly point this out ? – Zilog80 Jun 20 '22 at 08:43
  • 1
    @Zilog80: I did say "*doesn't stop until it sees a whole 8-byte qword that's `0`*. I thought that was specific enough about what the bug was (not looking at individual bytes), but yeah I guess I could have linked [How to load a single byte from address in assembly](https://stackoverflow.com/q/20727379) even if I decided not to close this as a duplicate of it. – Peter Cordes Jun 20 '22 at 08:59
  • @user1234: The bytes we can see with GDB are outside the printable ASCII range, and might be invalid UTF-8. On a terminal, they might print as the empty string. Pipe your program's output into `hexdump -C`, and/or run it under `strace` to see the actual binary data that gets passed to a `write` system call. (If it's not the same as what GDB shows RDI points to, your puts function is also broken.) – Peter Cordes Jun 20 '22 at 09:04
  • It's also possible that one call to `puts` output Hello World twice, and the other output nothing, but that seems a bit less likely. `test` is at a higher address, not lower, so an unterminated test[] wouldn't read into the stack space for `s[]`. But again, without single-stepping and checking memory contents, you can't be sure. Especially if you wrote your own puts, I'd trust GDB more than it. – Peter Cordes Jun 20 '22 at 09:06
  • 1
    Re: your `strcpy`: ok, it uses a different strategy for finding the terminating byte, so it will stop at the right place. But it will have copied 7 bytes past the end, since you have the same bug of loading and storing qwords instead of bytes, but incrementing your pointer by 1. It would even segfault if you tried to copy a string near the end of a page, and the next page wasn't mapped. Or in this case, potentially corrupted some stack variable after the end of the destination, for example the start of `s`. – Peter Cordes Jun 20 '22 at 09:09
  • 1
    Oh also, you call `puts(s)` *before* your `strcpy`, so it couldn't detect bugs in your strcpy corrupting its start. Yeah, that's gotta be it. My previous comment was wrong, **`test[]` is at a *lower* address than `s[]`, so writing past the end of `test[]` steps on the start of `s[]`** So it does all just come down to the same bug as your strcmp: how to load a single byte, so it's a duplicate of [How to load a single byte from address in assembly](https://stackoverflow.com/q/20727379) – Peter Cordes Jun 20 '22 at 09:10

0 Answers0