1

As a preface, I'm running this on a 64-bit Linux OS and am thus using the 64-bit Linux syscalls.

My strlen procedure takes rdi as a parameter and returns the length of the string excluding \0 in rax. (You have to enter a string that ends with \0). I have tested this procedure and know it works correctly.

My puts procedure takes rdi as its parameter, uses strlen to get the length and then uses the write syscall.

Here is a (hopefully) minimal reproducible example:

section .data
    msg db "HELLO", 10, 0

section .text
    global _start
_start:
    mov  rdi, msg
    call puts
    mov  rax, 0x3c
    xor  rdi, rdi
    syscall

strlen:
    push rdi
    push rcx
    push rsi

    xor rax, rax
    mov rcx, 0xffffffff
    repnz scasb
    jnz .error
    not rcx
    dec rcx
    mov rax, rcx
.return:
    pop rsi
    pop rcx
    pop rdi
    ret
.error:
    mov rax, -1
    jmp .return

puts:
    mov rsi, rdi
    call strlen
    mov rdx, rax
    mov rax, 1
    mov rdi, rax
    syscall
    ret

I'm really confused as to why this isn't working. My puts procedure is just a simple syscall really, so I honestly don't know what is going wrong here.

mediocrevegetable1
  • 4,086
  • 1
  • 11
  • 33

1 Answers1

3

I have tested [my strlen] and know it works correctly.

I'm afraid not.

mov rcx, 0xffffffff

You seem to be thinking this loads rcx with -1, but it doesn't: 0xffffffff is -1 as a 32-bit signed integer, but rcx is a 64-bit register. (Maybe you copied from some 32-bit code?) In particular, mov rcx, 0xffffffff followed by not rcx does not result in rcx containing 0, but rather 0xffffffff00000000. As a result your strlen returns a seriously incorrect value.

Change this to mov rcx, -1 or mov rcx, 0xffffffffffffffff.

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82
  • Ah yes, that works. I can't believe my whole thing wasn't working because of such a stupid error. It's strange though, when I printed the output of `strlen` on its own, it showed the correct number. Either way, your answer solves my issue. – mediocrevegetable1 Feb 06 '21 at 17:20
  • 1
    Did you print all 64 bits of the result, or just the low 32 bits? The low 32 bits would contain the correct value (for a string less than 4 GB in size). – Nate Eldredge Feb 06 '21 at 17:21
  • Oh yeah, I stored the result in a byte-large variable since I didn't think I would need to test a large string, so you're correct. – mediocrevegetable1 Feb 06 '21 at 17:23
  • 1
    @mediocrevegetable1: Yep, there you go. By the way, this was really easy to find by single-stepping with `gdb`, so that's a good thing for you to practice doing if you haven't been. `strace` also helps as it shows a ridiculous length being passed to the `write` system call. – Nate Eldredge Feb 06 '21 at 17:24
  • Yeah, I don't really know how to use a debugger much at all. I guess I never felt the need to since print statements in the correct places worked for me in other languages, but that's not so easy in assembly. – mediocrevegetable1 Feb 06 '21 at 17:26
  • 1
    @mediocrevegetable1: Now's the time, I would say. It's an important investment, almost mandatory for assembly programming. Case in point, you spent over an hour waiting for an answer to this question, not to mention the time you spent writing it, and with a debugger you'd have found the bug in three minutes. – Nate Eldredge Feb 06 '21 at 17:30
  • Indeed, and I spent an insane amount of time trying to solve this issue before I even asked this question. – mediocrevegetable1 Feb 06 '21 at 17:34
  • @mediocrevegetable1: It's really not hard. gdb has a ton of functionality, but half a dozen commands will cover 99% of what you want to do. – Nate Eldredge Feb 06 '21 at 17:42
  • Yeah, I'm trying gdb out on a test program right now. Some things are still a bit confusing, but I'm learning basic stuff. – mediocrevegetable1 Feb 06 '21 at 17:45
  • @mediocrevegetable1: the bottom of https://stackoverflow.com/tags/x86/info has some GDB tips for asm. And BTW, if you wanted to assume 32-bit lengths would be sufficient for any string, you could and should have used 32-bit operand-size for your math to actually ignore high bits of the register. e.g. `not ecx` / `lea eax, [rcx - 1]` would have returned a result that discarded any high garbage. And/or `mov edx, eax` would have zero-extended the 32-bit length into RDX. But yes, the 64-bit way would be to just start with `rcx=-1` to make the math work that solves `rcx = -strlen - 2` after rep – Peter Cordes Feb 06 '21 at 23:05
  • @PeterCordes I'll check it out, thanks. As for the registers, yeah, I realised after I found the issue that this whole issue could have been solved had I used 32-bit registers, but I hadn't thought of that before. I'm still learning assembly, so I'm trying to learn which size registers to use in each situation. – mediocrevegetable1 Feb 07 '21 at 06:00
  • Generally use 32-bit for everything, except pointers (or sizes if you want to support huge sizes). Or of course stuff where more bits let you get more useful work done. [The advantages of using 32bit registers/instructions in x86-64](https://stackoverflow.com/q/38303333). In this case (a strlen implementation), it would be most natural to handle the possibility of a 64-bit size. But since you happened to use a 32-bit constant like `0xffffffff`, it's sort of a fun fact that using matching reg sizes would have worked. (Of course an *efficient* strlen would use SSE2, not slow `rep scasb`...) – Peter Cordes Feb 07 '21 at 06:18
  • @mediocrevegetable1: Or just avoid implicit-length strings in the first place, so you always just know how long they are. (`read` return value on input, or have the assembler calculate it for you for static constant strings.) C strings are used for paths for system calls, but other than that there's nothing forcing you to use them. – Peter Cordes Feb 07 '21 at 06:20
  • @PeterCordes interesting, I'll keep that in mind. As for the second comment (about not using 0-terminated strings), Indeed, I may just do that. The reason I made a `strlen` and a `puts` function was simply another challenge of sorts, but in a practical situation, I can see why it would be easier and better to simply add a length label for each string. I'll also check out what SSE2 is. – mediocrevegetable1 Feb 07 '21 at 06:28
  • Yup, makes sense as an exercise. Re: SIMD with SSE2 to check 16 bytes in parallel, and `bsf` bit-scan for the position within the last vector: [Questions about the performance of different implementations of strlen](https://stackoverflow.com/q/34449407) has an example (with AT&T syntax as gcc inline asm, so it's harder to understand, but I think the text of my answer might be useful.) Maybe also [How much faster are SSE4.2 string instructions than SSE2 for memcmp?](https://stackoverflow.com/q/46762813) – Peter Cordes Feb 07 '21 at 06:34
  • IMO if you already know how to program in general (in other languages), one of the ways "thinking in asm" lets you do things that other language make inconvenient is easily doing things like copying 2 adjacent `int` variables with one qword load / store, or other things like that. So using SIMD even for non-looping things when it can help is fun. – Peter Cordes Feb 07 '21 at 06:36
  • Interesting, I'll try to remember that when using assembly in the future. Thanks for all the advice. – mediocrevegetable1 Feb 07 '21 at 06:41
  • Oh, just remembered that there are some better SSE2 SIMD strlen Q&As: [Why is this code 6.5x slower with optimizations enabled?](https://stackoverflow.com/q/55563598) and [Why does glibc's strlen need to be so complicated to run quickly?](https://stackoverflow.com/a/57676035) – Peter Cordes Feb 07 '21 at 07:03
  • Ah, sorry for the late reply, just saw this. Thanks for the links, I'll check them out. – mediocrevegetable1 Feb 07 '21 at 13:03