so following up my last question about inline asm in gcc (here), I followed the many suggestions I received and dumped inline asm for the moment in favor of assembling the functions as external obj files to be linked with the main C program. With the help of godbolt.org and a few instruction set manuals I wrote a simple function to multiply two integers using bitwise shifting:
.global mul_shift
.section .text
mul_shift:
movl $0, %ebx
jmp test_while
if_test:
movl %esi, %eax
andl $0x01, %eax
jz int_while
movl %edi, %eax
addl %eax, %ebx
int_while:
sall %edi
sarl %esi
test_while:
cmpl $0, %esi
jg if_test
movl %ebx, %eax
ret
I have a few questions that arise especially comparing my code (which assembles, links and runs) with the asm generated by gcc when compiling the same function in C (down at the end of the question).
gcc initializes the stack frame with
pushq %rbp
andmovq %rsp, %rbp
, closing then withpopq %rbp
at the end of the function. It then moves all the variables passed to the function in the stack. I am aware of the why, but in this case having only 2 input integers, isn't that unnecessary extra work?Besides having extra instructions to initialize the stack frame, how efficient is to access variables in the stack (or in memory) rather than directly in CPU registers? Because gcc moves all variables to the stack and also initializes the variable to be returned to the stack and then moves it to
%eax
only before ending the program. Why not keeping it there the whole time? Am I missing some subtleties that will come back to bite me in more complex, bigger programs? Or is just a working but not optimal implementation on the compiler's side?I couldn't help but notice that gcc performs the logical and instruction twice on the same variable:
if_test:
movl -24(%rbp), %eax
andl $0x01, %eax ;here
testl %eax, %eax ;and here?
jz int_while
movl -20(%rbp), %eax
addl %eax, -4(%rbp)
Why is that? Am I missing something here too?
The original C template for the function is the following:
int shift_mul(int a, int b)
{
int ans = 0;
while (b > 0)
{
if (b & 1)
{
ans += a;
}
a = a << 1;
b = b >> 1;
}
return ans;
}
EDIT: I fixed the bug pointed out in the comments (replaced %ebx
with %edx
), and also eliminated some redundant code. The bitwise logical and now is performed by testl
rather than andl
so that only the flags are raised without changing the value of any register. sall
was changed into shrl
to accomodate negative value for int b
and jg if_test
was replaced by a jne
.
.global mul_shift
.section .text
mul_shift:
movl $0, %edx
jmp test_while
if_test:
testl $0x01, %esi
jz int_while
addl %edi, %edx
int_while:
sall %edi
shrl %esi
test_while:
cmpl $0, %esi
jne if_test
movl %edx, %eax
ret