0

My example is very simple:

extern int fun(int arr[]);
int foo(void)
{
        int arr[64] = {0};
        return fun(arr);
}

When I compile it using clang, the asm code is very straightforward:

foo:                                    # @foo
        .cfi_startproc
# %bb.0:
        subq    $264, %rsp              # imm = 0x108
        .cfi_def_cfa_offset 272
        xorps   %xmm0, %xmm0
        movaps  %xmm0, 240(%rsp)
        movaps  %xmm0, 224(%rsp)
        movaps  %xmm0, 208(%rsp)
        movaps  %xmm0, 192(%rsp)
        movaps  %xmm0, 176(%rsp)
        movaps  %xmm0, 160(%rsp)
        movaps  %xmm0, 144(%rsp)
        movaps  %xmm0, 128(%rsp)
        movaps  %xmm0, 112(%rsp)
        movaps  %xmm0, 96(%rsp)
        movaps  %xmm0, 80(%rsp)
        movaps  %xmm0, 64(%rsp)
        movaps  %xmm0, 48(%rsp)
        movaps  %xmm0, 32(%rsp)
        movaps  %xmm0, 16(%rsp)
        movaps  %xmm0, (%rsp)
        movq    %rsp, %rdi
        callq   fun
        addq    $264, %rsp              # imm = 0x108
        .cfi_def_cfa_offset 8
        retq

Though gcc generate shorter asm code, I can't really understand it.Here is the code from gcc:

foo:
.LFB0:
        .cfi_startproc
        subq    $264, %rsp
        .cfi_def_cfa_offset 272
        xorl    %eax, %eax
        movl    $32, %ecx
        movq    %rsp, %rdi
        rep stosq
        movq    %rsp, %rdi
        call    fun@PLT
        addq    $264, %rsp
        .cfi_def_cfa_offset 8
        ret

I am an asm code novice. Sorry about that if my question is stupid. But any guidance is welcome.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Mr Pang
  • 1,083
  • 1
  • 8
  • 20
  • 1
    [`rep stosq`](https://www.felixcloutier.com/x86/REP:REPE:REPZ:REPNE:REPNZ.html) = `memset(rdi, rax, rcx*8)`. Neither of these are obviously great choices; that's maybe too much unrolling from clang, and ERMSB doesn't make `rep stos` that great. But a medium-sized memset is a hard problem on modern x86; it's small enough that `rep stos` startup overhead matters. – Peter Cordes May 29 '19 at 08:29
  • Good point. It is clear if "req stosq" means "call memset" – Mr Pang May 29 '19 at 08:33
  • It doesn't actually `call` memset, there's no `call` instruction. It invokes the optimized microcode inside the CPU that implements `rep stos`. (And leaves RDI and RCX modified.) – Peter Cordes May 29 '19 at 08:35

0 Answers0