Why does the assembly code copy the value from %edx to %rcx before adding to the sum？

Question

Compiling with x86-64 gcc -Og -std=gnu99 -xc.

In the second line of .L3 (addl (%rdi,%rcx,4), %eax), why not just use the register %edx when adding to the sum？

addl (%rdi,%edx,4), %eax

int sum_arr(int arr[], int nelems) {
  int sum = 0;
  for (int i = 0; i < nelems; i++) {
    sum += arr[i];
  }
  return sum;
}

sum_arr:
        movl    $0, %edx
        movl    $0, %eax
        jmp     .L2
.L3:
        movslq  %edx, %rcx
        addl    (%rdi,%rcx,4), %eax
        addl    $1, %edx
.L2:
        cmpl    %esi, %edx
        jl      .L3
        rep ret

`-Og` is designed for debugability first, for speed second. This *probably* means that entire debug-unsafe optimisation *passes* don't run. If such a pass contains a few applicable optimisations that are actually debug-safe *in your particular case*, well, this means they will have to wait until -O2. — n. m. could be an AI, Aug 01 '21 at 10:59

score 4 · Accepted Answer · answered Aug 01 '21 at 17:04

As 4386427's previous answer pointed out, you cannot mix 32- and 64-bit registers in an effective address. The CPU doesn't support that. So addl (%rdi,%edx,4), %eax would not be encodeable.

To use i as the index part of an effective address, we need it in a 64-bit register. Since i is of type int, which is signed, the compiler sign-extends it with movsx. And it uses a separate register %rcx so that %edx can continue to hold the value of the variable i, making it easier for a debugger to inspect this value (e.g. print i in gdb).

As it turns out, we can prove that i will always be nonnegative in this function. The initial movl $0, %edx also zeros out the high half of %rdx, and it will remain zero from then on, so in fact %rdx does always contain the correct 64-bit value of the variable i. Thus we could have used addl (%rdi, %rdx, 4), %eax instead, and omitted the movsx. The compiler probably didn't make that deduction at this level of optimization, though.

(It is also possible to use all 32-bit registers in an effective address with an address size override prefix, so addl (%edi, %edx, 4), %eax is an encodeable instruction, but it won't work since it would truncate the high 32 bits of the pointer arr in %rdi. For this reason the address size override is hardly ever useful in 64-bit code.)

Why does the assembly code copy the value from %edx to %rcx before adding to the sum？

1 Answers1