0
1  .text
2  .globl mystery
3  .type mystery, @function

4  mystery:
5  movq %rdi, %rax
6  movq %rdi, %rdx
7  cmpb $0, (%rdi)
8  je .L5

9  .L3:
10 addq $1, %rdx
11 cmpb $0, (%rdx)
12 jne .L3

13 .L5:
14 addq $1, %rsi
15 movb -1(%rsi), %cl
16 addq $1, %rdx
17 movb %cl, -1(%rdx)
18 cmpb $0, %cl
19 jne .L5

20 ret

I am really confused about this one. Is it an implementation of strcpy or strcat. Also, I understand that the first loop searches for the null terminator of the string, and the second loop does the same but also makes a copy of the string, but I am having a hard time wrapping my head around what the entire function is doing. Is it copying string stored at %rdi to another location (strcpy) or is it adding string stored at %rsi to the end of %rdi (strcat).

General Grievance
  • 4,555
  • 31
  • 31
  • 45
  • 1
    It's hard to read code formatted that way. Use a ```code block``` (triple backticks) without line numbers. – Peter Cordes Apr 29 '23 at 17:55
  • It's reading from memory at (%rdx) in the first loop, then writing there in the second loop. And it doesn't reset RDX between loops. (IDK why it copies RDI to RDX in the first place, so it would be even easier to keep track of the fact that it's the first function arg that's being searched and then appended to.) – Peter Cordes Apr 29 '23 at 17:58
  • Keeping an unchanged copy of the function arg for debugging purposes you mean? That's remotely possible, but it already has a copy in RAX (which will become the return value, since C string functions are [poorly designed to return their first arg instead of something useful like a pointer to the end of the string](https://stackoverflow.com/questions/3561427/strcpy-return-value).) If this is actual GCC output, it's a missed-optimization bug, since it doesn't actually use the original RDI later. And if it did, it still has the value in RAX. – Peter Cordes Apr 29 '23 at 18:18
  • Hint: Does `strcpy` ever read from its destination like this does? – Peter Cordes Apr 29 '23 at 18:26
  • You can tell which function is just by looking at the number of arguments. Furthermore the three `cmpb $0, xxx` are a spot on. – Margaret Bloom Apr 29 '23 at 18:28
  • Also, this probably isn't actual GCC output, at least not with `-O2` or higher. GCC would definitely use `test %cl, %cl` instead of `cmpb $0, %cl`. If it's GCC `-O1`, missed-optimizations aren't bugs, just things the compiler intentionally didn't spend time looking for. – Peter Cordes Apr 29 '23 at 18:28
  • 1
    I can get pretty close to reproducing the asm with GCC 4.8.4 `-O2` https://godbolt.org/z/3xG37nE9n (spoiler alert on the function name and C source). It doesn't peel the first iteration of the first loop (so the store offsets are different), but the redundant copying of `%rdi` to `%rdx` is there. It uses `movzbl -1(%rsi), %ecx` to load the src string in the 2nd loop, and `test %cl, %cl`, so probably this assignment was actual GCC output "simplified" by hand to use a byte load, and `cmp` instead of `test`. `-fno-peephole` didn't stop GCC from doing that. – Peter Cordes Apr 29 '23 at 18:49
  • @adawg please roll it back when they are done. How long should it take? A week? – vvv444 Apr 30 '23 at 04:20

1 Answers1

2

With pseudo-C comments

1  .text
2  .globl mystery
3  .type mystery, @function

4  mystery:
5  movq %rdi, %rax      # rax = rdi
6  movq %rdi, %rdx      # rdx = rdi
7  cmpb $0, (%rdi)      # *(char*)rdi == 0 ?
8  je .L5               # Jump if equal (zero)

9  .L3:
10 addq $1, %rdx        # ++rdx
11 cmpb $0, (%rdx)      # *(char*)rdx == 0?
12 jne .L3              # Jump if not equal (non zero)

13 .L5:
14 addq $1, %rsi        # ++rsi
15 movb -1(%rsi), %cl   # cl = *((char*)rsi - 1)
16 addq $1, %rdx        # ++rdx
17 movb %cl, -1(%rdx)   # *((char*)rdx - 1) = cl
18 cmpb $0, %cl         # Is cl == 0 ?
19 jne .L5              # Jump if not equal

20 ret

So rewriting it into C code:

char* func(char* rsi, char *rdi) {
    char *dest = rdi;   // rdx 
    char *src  = rsi;   // rsi
    char c;

    // Find first '\0'
    if (*dest != 0)
        while (*++dest != 0)
            ;

    // Copy from src until '\0' reached
    do {
        c = *src++;
        *dest++ = c;
    } while (c != 0)

    return rdi; // rax
}

So we see that this is strcat() because it first skips the existing string and then copies.

vvv444
  • 2,764
  • 1
  • 14
  • 25
  • In the asm, the `.L3` loop is also a do-while structure, probably the result of a compiler turning a `while(){}` loop into an `if() do{}while()` loop ("loop inversion optimization") by peeling the first iteration. See [Why are loops always compiled into "do...while" style (tail jump)?](https://stackoverflow.com/q/47783926). But yeah, writing it this way in the source does actually get GCC 4.9 -O1 to make almost exactly that asm source: https://godbolt.org/z/hn64s1WeG different only in label numbering (.L4 vs. .L5) and in using `movzbl` and `testb` instead of `movb` and `cmpb`. – Peter Cordes Apr 29 '23 at 19:49
  • True. I didn't try to stick to some exact code constructs, just produce some logically equivalent code. And obviously it's almosy impossible to get the same assembly, depends on specific compiler and its optimizations :) – vvv444 Apr 29 '23 at 20:36
  • Godbolt has lots of GCC versions installed. The code in the question is probably hand-edited by an instructor to "simplify" `movzbl` byte loads into `movb` to a partial register, and `testb` to `cmpb`. But other than that and one label name, GCC4.9 -O1 reproduced it right down to the register allocation, including the instruction scheduling and all the load offsets. Probably with different source or a different GCC version, we could get `.L5`. It's probably not too ancient a GCC version; GCC4.1 and presumably earlier used `inc` instead of `add $1`. – Peter Cordes Apr 29 '23 at 20:43
  • 1
    I wondered if `gcc -Os` could reproduce the `movb` instead of `movzbl`, but no, and would definitely use `test` instead of `cmp`. Anyway, as you say, that's not really important, it's more just idle curiosity for people like myself that have looked at a lot of GCC output and recognize some of its typical code-gen choices. – Peter Cordes Apr 29 '23 at 20:43
  • @PeterCordes, Yep. Thanks for the insights! I didn't know some of these details like optimizer peeling first loop iteration :) – vvv444 Apr 30 '23 at 04:16