4

I'm new to assembly (Intel x86_64) and I am trying to recode some functions from the C library. I am on a 64-bit Linux and compiling with NASM.

I have an error with the strchr function and I can't find a solution...

As a reminder here is an extract from the man page of strchr :

char *strchr(const char *s, int c);

The strchr() function returns a pointer to the first occurrence of the character c in the string s.

Here is what I tried :

strchr:
    push rpb
    mov rbp, rsp

    mov r12, rdi ; get first argument
    mov r13, rsi ; get second argument

    call strchr_loop


strchr_loop:
    cmp [r12], r13 ; **DON'T WORK !** check if current character is equal to character given in parameter... 
    je strchr_end  ; go to end

    cmp [r12], 0h  ; test end of string
    je strchr_end  ; go to end

    inc r12        ; move to next character of the string

    jmp strchr_loop ; loop


strchr_end
    mov rax, r12    ; set function return

    mov rsp, rbp
    pop rbp

This return a pointer on the ned of the string and don't find the character...

I think it's this line which doesn't work :

    cmp [r12], r13

I tested with this and it worked :

    cmp [r12], 97 ; 97 = 'a' in ASCII

The example :

char *s;

s = strchr("blah", 'a');
printf("%s\n", s);

Returned :

ah

But I can't make it work with a register comparison. What am I doing wrong, and how can I fix it?

Michael Petch
  • 46,082
  • 8
  • 107
  • 198
SomeRaven
  • 61
  • 1
  • 8
  • 6
    `r13` is a 64-bit register. Your characters are one byte each. Perhaps you meant to use `r13b`. – Michael Mar 14 '16 at 21:45
  • 4
    You need to save and restore the callee-saved registers you clobber. All calling x86-64 calling conventions I know of require that `r12` and `r13` be preserved. – EOF Mar 14 '16 at 21:50
  • @MichaelPetch Yes sorry, I forgot to mention it, I edited my post – SomeRaven Mar 14 '16 at 22:09
  • @Michael Oh ok ! I thought that it was something like that, but I didn't know about this registers ^^' – SomeRaven Mar 14 '16 at 22:10
  • @EOF I didn't find many documentation about which registers to use... I'm from C and I'm a little lost without the possibility to create variables ^^' Could you show me some link about which registers I can use, and which I can't ? – SomeRaven Mar 14 '16 at 22:12
  • @SomeRaven: I only really understood assembly when I learned about ABI's and calling conventions. Without them, registers are a confusing mess. – EOF Mar 14 '16 at 22:14
  • 4
    The best place to get information on the calling convention for 64-bit code on Linux is to review the System V 64-Bit ABI. In particular if you want a list of caller/callee registers and their usage, then they can be found in [Figure 3.4 Register Usage](http://www.x86-64.org/documentation/abi.pdf) on page 21. – Michael Petch Mar 14 '16 at 22:15
  • @MichaelPetch Thanks you I will take a look at it tomorrow ! – SomeRaven Mar 14 '16 at 22:16
  • 2
    Prefer using eax/ecx/edx for scratch registers that don't need to be saved/restored, because instructions using them don't need a REX prefix. No speed diff other than code size, but code-size matters some. And you can just increment `rsi`; there's no need to copy it to another register. See the [x86 tag wiki](http://stackoverflow.com/tags/x86/info), esp Agner Fog's guides. – Peter Cordes Mar 14 '16 at 23:48
  • 1
    Single-stepping in a debugger is essential, BTW. Have you tried that, instead of just looking at return values? – Peter Cordes Mar 14 '16 at 23:55
  • @PeterCordes Thanks for your help ! But could you explain a little more about the REX prefix ? And I debugged with gdb (I'm new to it btw) and I find that values are correct at every step but the comparison doesn't work so it continue till the end of the string. – SomeRaven Mar 15 '16 at 07:50
  • @SomeRaven: yup, that's usually what you get when debugging asm: everything is what you expect up until one instruction. Your case doesn't make it obvious what the error was, since you used the wrong operand-size in a compare. If it was something else, then you'd more easily be able to see "oh, I loaded or stored 8 bytes". – Peter Cordes Mar 15 '16 at 23:45
  • RE: REX prefix bytes: instructions that access r8-r15, or that use the low byte of si/di/bp/sp (`sil` / `bpl` ...), or that use a 64bit operand-size (`mov rax, rdi`) need an extra byte of machine-code. Don't worry about it until writing code that works at all isn't your main problem: there's enough to worry about just trying to understand registers, instructions, flags, memory, ABIs/calling conventions, etc. without worrying about optimizing for code-size. For now, try to minimize the number of branches, and the number of instructions executed on any one path through your function. – Peter Cordes Mar 15 '16 at 23:47
  • Actually it's hard to give good general rules of thumb for writing code that will run well. Keep your data in registers as much as possible, even though that conflicts with minimizing number of instructions. It's better to load and do stuff to a register, then store, instead of using a memory operand repeatedly. e.g. `cmp [rdi], al / je` / `cmp [rdi], 0 / jz` isn't great, because you could have loaded once and reused the data in a register. Just read Agner Fog's ASM guide if you want more advice like this. Otherwise just worry about making correct programs first :P – Peter Cordes Mar 15 '16 at 23:51

2 Answers2

1

First, thanks for your help ! I think I have a better understanding of what I am doing.

I was stuck with the problem of receiving a 8bits parameter instead of the 64bits rdi... But a friend shows me that the first 8bit parameter is also in the sil register.

So here's my working code :

strchr:

   push rpb
   mov rbp, rsp

   call strchr_loop


strchr_loop:
    cmp byte [rdi], sil ; check if current character is equal to character given in parameter
    je strchr_end  ; go to end

    cmp byte [rdi], 0h  ; test end of string
    je strchr_end  ; go to end

    inc rdi        ; move to next character of the string

    jmp strchr_loop ; loop


 strchr_end
    mov rax, rdi    ; set function return

    mov rsp, rbp
    pop rbp

Please feel free to tell me if there is a way to improve it and thanks again !

SomeRaven
  • 61
  • 1
  • 8
  • 3
    You have a stray `call` instruction. The only reason your code still works is that you're restoring `rsp` from `rbp`, so the return-address pushed by `call strchr_loop` is thrown away. This will give you a large performance penalty for a mismatched call/return. Just remove that line to let execution fall into the top of the loop naturally. – Peter Cordes Mar 15 '16 at 23:54
  • 1
    And like I commented on the question, better code would be: `mov al, [rdi] / test al,al/jz / cmp al,sil/je` Or maybe `movzx eax, byte [rdi] / test eax,eax/jz / cmp eax,esi/je`. Your way isn't too bad, though, since you're not doing any read-modify-writes to memory. You also don't need to make a stack frame with `rbp`; that's optional. – Peter Cordes Mar 15 '16 at 23:59
  • @PeterCordes Thanks for your advice ! I finally removed ` call strchr_loop`. It was like a call to a function for me, but I understand that it's useless in this case. What do you mean by "make a stack frame with rbp" ? The push et pop that I did in the beginning and the end of the code ? I don't really understand these lines, they are from the only example of my assembly lesson ^^' – SomeRaven Mar 18 '16 at 07:20
  • 1
    By "making a stack frame", yeah, I meant the stuff with `rbp`. See http://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64. I skimmed it briefly. `-fomit-frame-pointer` has been the gcc default for a while now, since metadata in the executable makes backtracing possible other than by following the linked list of stack frames created by pushing `rbp`. – Peter Cordes Mar 18 '16 at 07:33
  • @PeterCordes Oh ok, I think I understand ! I removed those lines and it's still working, so it means 4 lines less. Thanks again ! – SomeRaven Mar 18 '16 at 08:43
0

Here is a fix for your assembly code, which implements the strchr(3), in x86-64 Assembly as defined in the man pages:

asm_strchr:
        push rbp
        mov rbp, rsp

strchr_loop:
        cmp byte [rdi], 0  ; test end of string
        je fail_end     ; go to end

        cmp byte [rdi], sil ; check if current character is equal to character given in parameter
        je strchr_end   ; go to end

        inc rdi         ; move to next character of the string
        jmp strchr_loop         ; loop

strchr_end:
        mov rax, rdi            ; set function return
        mov rsp, rbp
        pop rbp
        ret

fail_end:
        mov rax, 0
        mov rsp, rbp
        pop rbp
        ret
Bourhano
  • 63
  • 6
  • 1
    Why include the prologue (1x) and epilogue (2x) if the code doesn't use it? You can efficiently zero `RAX` with `xor eax, eax` – Sep Roland Dec 14 '20 at 22:49
  • You didn't fix any of the other inefficiencies in the OP's code (like [loop structure](https://stackoverflow.com/questions/47783926/why-are-loops-always-compiled-into-do-while-style-tail-jump)), or accessing memory twice. Duplicating the tail makes the code-size cost of messing with RBP worse. Of course, there's no way to write an efficient `strchr` that only checks 1 byte at a time; on x86-64 it's almost always worth using SSE2 `pcmpeqb`, unless you expect to find the character within the first few bytes, so optimizing a byte-at-a-time `strchr` is mostly just an exercise. – Peter Cordes Dec 14 '20 at 23:00