0

I'm trying to implement a function int string_cmp(const char* str1, char* str2) that takes in 2 strings and I'm supposed to return 1 if str1 is greater than str2 alphabetically, 0 if they are the same and -1 if str1 is smaller.

I'm a newbie in assembly, so I have almost no clue on how can I implement this function correctly, here is what I have so far:

int string_cmp(const char* str1, char* str2) {
    int result;
    asm volatile (
        "mov %[src], %%rsi\n"     
        "mov %[dst], %%rdi\n"      
        );
    }

I just want to clarify that the reason I have a very very tiny portion of a far from completed code is because I genuinely don't know how to proceed from this point onwards, so if anyone could give me an explanation or show me a sample code that would work as per the requirements that I have mentioned above, it would be really helpful to me and others struggling to learn as well.

Thank you.

Edit: Here is what I have so far:

int string_cmp(const char* str1, char* str2) {
    int result;

    asm volatile (
        "cld\n"                 // Clear the direction flag to move forward

        "1:\n"
        "lodsb\n"               
        "scasb\n"               // Compare the byte in al with the byte at [rdi] and increment rdi

        "jne 2f\n"              
        "test al, al\n"        
        "jne 1b\n"             

        "xor eax, eax\n"        
        "jmp 3f\n"              
        "2:\n"
        "sbb eax, eax\n"        // Set eax to -1 if the strings are not equal

        "3:\n"
        : [src] "+r" (str1),    
        "=a" (result)         // Output: store the result in the 'result' variable
        : [dst] "D" (str2)      // Input: str2 is in the rdi register
        : "cc"                  // Clobbered flags
        );

    return result;
}
Deezel
  • 111
  • 6
  • 1
    You're checking for 2 things: a mismatch, or one of the strings have a `0` byte. (You only need to check one, since the other will mismatch if it didn't also end.) So like an `if()break` (cmp/jne) inside a `do{}while()` loop. I'd recommend writing a whole function instead of using inline `asm` - describing your asm precisely to the compiler for it to inline is an extra set of challenges. And you're already starting out badly, with two useless `mov` instructions instead of asking the compiler for pointers in registers. – Peter Cordes Jul 12 '23 at 17:12
  • 1
    Also maybe related: [Looping over arrays with inline assembly](https://stackoverflow.com/q/34244185) and https://stackoverflow.com/tags/inline-assembly/info if you insist on using inline asm. – Peter Cordes Jul 12 '23 at 17:13
  • @PeterCordes thank you for ur suggestion, but I'm not sure what do you mean by "two useless mov instructions instead of asking the compiler for pointers in registers.", can you clarify or maybe give a sample code to demonstrate what you mean? – Deezel Jul 12 '23 at 17:32
  • 1
    Like I mentioned in my answer to your [previous question](https://stackoverflow.com/questions/76672242/how-can-i-copy-a-string-from-a-source-to-destination-using-in-line-assembly), `[src] "+r"(source)` will get the compiler to put the pointer in a register for you. Use `inc %[src]` etc. It was probably already there as a function arg so this costs the compiler zero instructions. (Look at the compiler-generated asm for your inline-asm function (or a caller that inlines it) on https://godbolt.org/ / [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116).) – Peter Cordes Jul 12 '23 at 18:45
  • 1
    If your asm statement starts or ends with a `mov`, usually you're doing it wrong and should have just told the compiler where you wanted that input variable. – Peter Cordes Jul 12 '23 at 18:47
  • @PeterCordes I've taken your advice into consideration and wrote up the code in my original question. (I edited it into the post itself). Are there any suggestions that you would make to improve this? – Deezel Jul 13 '23 at 08:20
  • You're modifying a register you told the compiler was a read-only input (`"D"(str2)` but `scasb` modifies RDI. Use `"+D"` and `"+S"`). And you haven't told the compiler that [src] needs to pick RSI; you used `"+r"` to give it a choice of any register but then only used RSI implicitly in the asm template, not `%[src]` with instructions that would still work even if it picked RDX or whatever. Also, [How can I indicate that the memory \*pointed\* to by an inline ASM argument may be used?](https://stackoverflow.com/q/56432259) - easiest way is a `"memory"` clobber. – Peter Cordes Jul 13 '23 at 17:42
  • Your output code is also not efficient and not correct: `sbb eax,eax` sets EAX=-CF. But CF will be zero if the byte in the first string was not below the byte in the 2nd string. (So you're returning `[rsi] < [rdi]` rather than `[rsi] != [rdi]` for the mismatching byte). If you start with `xor eax,eax` you'll break the initial false dependency on merging a load into the bottom of RAX (`lodsb`), and you can use `setne al` to materialize a `0` or `1` boolean in AL, zero-extended to EAX since you zeroed RAX earlier. – Peter Cordes Jul 13 '23 at 17:45

0 Answers0