1

so following up my last question about inline asm in gcc (here), I followed the many suggestions I received and dumped inline asm for the moment in favor of assembling the functions as external obj files to be linked with the main C program. With the help of godbolt.org and a few instruction set manuals I wrote a simple function to multiply two integers using bitwise shifting:

.global mul_shift

.section .text

mul_shift:

    movl $0, %ebx
    jmp test_while

if_test:
    
    movl %esi, %eax     
    andl $0x01, %eax                
    jz int_while
    movl %edi, %eax
    addl %eax, %ebx

int_while:
    
    sall %edi
    sarl %esi

test_while:

    cmpl $0, %esi
    jg if_test
    movl %ebx, %eax
    ret 

I have a few questions that arise especially comparing my code (which assembles, links and runs) with the asm generated by gcc when compiling the same function in C (down at the end of the question).

  1. gcc initializes the stack frame with pushq %rbp and movq %rsp, %rbp, closing then with popq %rbp at the end of the function. It then moves all the variables passed to the function in the stack. I am aware of the why, but in this case having only 2 input integers, isn't that unnecessary extra work?

  2. Besides having extra instructions to initialize the stack frame, how efficient is to access variables in the stack (or in memory) rather than directly in CPU registers? Because gcc moves all variables to the stack and also initializes the variable to be returned to the stack and then moves it to %eax only before ending the program. Why not keeping it there the whole time? Am I missing some subtleties that will come back to bite me in more complex, bigger programs? Or is just a working but not optimal implementation on the compiler's side?

  3. I couldn't help but notice that gcc performs the logical and instruction twice on the same variable:

if_test:
    
    movl -24(%rbp), %eax        
    andl $0x01, %eax        ;here
    testl %eax, %eax        ;and here?
    jz int_while
    movl -20(%rbp), %eax
    addl %eax, -4(%rbp)

Why is that? Am I missing something here too?

The original C template for the function is the following:

int shift_mul(int a, int b) 
{
    int ans = 0;
    
    while (b > 0)
    {
        if (b & 1)
        {
            ans += a;
        }
        a = a << 1;
        b = b >> 1;
    }
    return ans;
}

EDIT: I fixed the bug pointed out in the comments (replaced %ebx with %edx), and also eliminated some redundant code. The bitwise logical and now is performed by testl rather than andl so that only the flags are raised without changing the value of any register. sall was changed into shrl to accomodate negative value for int b and jg if_test was replaced by a jne.

.global mul_shift

.section .text

mul_shift:

    movl $0, %edx
    jmp test_while

if_test:
        
    testl $0x01, %esi               
    jz int_while
    addl %edi, %edx

int_while:
    
    sall %edi
    shrl %esi

test_while:

    cmpl $0, %esi
    jne if_test
    movl %edx, %eax
    ret 
Fulvio
  • 31
  • 6
  • 2
    If you compile without optimization, gcc will keep copying variables from the stack to registers and back. Compile with at least `-O1` or `-Og` to get basic optimizations. Also try `-O2` or `-O3` or `-Os` (or check the man page for individual optimization options). – chtz Jul 30 '22 at 11:01
  • 1
    Or to say it clearer: by compiling without optimisations, you tell gcc to turn off its brain and generate stupid code. So better turn on these optimisations if you want to see good code. – fuz Jul 30 '22 at 14:58
  • Thanks both for the comments. @chtz I checked the answer about the temporal variable and it partially elucidates what gcc does and when. I tried a few optimization options on godbolt's compiler. I wrote a small function that returned an integer after having gone through some logic and when compiled with gcc -O1 it "hardwired" the resuting solution skipping the logic all together and that scared me a bit. I will keep experimenting. Do you always use optimization when compiling C? Would you suggest that as common practice? – Fulvio Jul 30 '22 at 15:32
  • 1
    @Fulvio: As a general rule, use optimization at least for the "release" build, and for all builds to run your test suite. Compile without optimization if you want to minimize compilation time, or if you are going to be running your code under a debugger and want the simplest possible debugging experience. (Unless of course the bug goes away without optimization, which is fairly common; many bugs in C are caused by undefined behavior, and the actual behavior in such cases can be greatly affected by optimization.) – Nate Eldredge Jul 30 '22 at 15:53
  • 1
    By the way, you have a bug in your assembly code: you modify the register `%ebx` without restoring it, but this register is required by [the ABI](https://stackoverflow.com/questions/18133812/where-is-the-x86-64-system-v-abi-documented) to be call-preserved. (Besides your instruction set manual, the ABI is another mandatory reference, or at least some document that clearly explains the calling conventions.) The simple fix would be to use `%ecx` or `%edx` in its place, as these are defined as call-clobbered. – Nate Eldredge Jul 30 '22 at 15:56
  • @NateEldredge aha! thanks for pointing the bug out. Now I understand better the concept of callee- or caller- saved registers. So when calling an external function from an obj file we have to make sure we restore the CPU before passing the control back to the caller I gather, right? Is that a common source of issues in bigger programs? – Fulvio Jul 30 '22 at 16:24
  • 1
    Yes, the called function has to comply with all calling conventions as to what machine state must be saved and restored. I don't think it's a particularly common source of issues; in high-level languages the compiler takes care of it automatically, and for the relatively rare programs that contain handwritten assembly, it's something that assembly programmers quickly learn to pay attention to. But it's true that such bugs, when they occur, can be subtle and hard to track down. – Nate Eldredge Jul 30 '22 at 17:05
  • 2
    @Fulvio: Yes, see [What does it mean that "registers are preserved across function calls"?](https://stackoverflow.com/q/63865026) re: what it means to follow a calling convention. Like [What registers must be preserved by an x86 function?](https://stackoverflow.com/q/9603003) / [What registers are preserved through a linux x86-64 function call](https://stackoverflow.com/q/18024672) – Peter Cordes Jul 30 '22 at 20:19
  • 2
    Your C function is weird. It uses signed `int b`, but bails out without doing anything if it's negative. (And uses an arithmetic right shift so it would loop forever on negative inputs otherwise). The low half (equal to the input width) of a multiplication is the same for unsigned vs. 2's complement (hence x86 only have `imul reg, reg`, not also `mul reg, reg` as well as the widening one-operand forms.) So you could have used `unsigned` and just had the caller do implicit conversion between int and unsigned (which is a no-op for 2's complement) for the inputs and output. – Peter Cordes Jul 30 '22 at 20:23
  • @PeterCordes you are right thanks for pointing that out, I realize now that it seems to be an issue also in the asm code that breaks for negative b. I am going to find a way to fix it and I will edit the question with the code. – Fulvio Jul 30 '22 at 20:38
  • The obvious thing is just to `shr` until `b` becomes `0`. – Peter Cordes Jul 30 '22 at 20:39
  • @PeterCordes just tried it out, it still returns 0 when b is negative, – Fulvio Jul 30 '22 at 20:44
  • 1
    If you didn't change `jg if_test` to `jne`, then of course it does. I said "until b *becomes* zero", meaning `do {} while(b != 0)`. Single-step your code in a debugger to see the path of execution. If you did change that, then maybe you have some other bug in your asm as well. Again, use a debugger, they're extremely helpful for assembly. – Peter Cordes Jul 30 '22 at 20:45
  • @PeterCordes which debugger would you suggest me to use? Haven't got one yet. I'm on Linux and use gcc and gas. P.S. The changes you suggested work! – Fulvio Jul 30 '22 at 20:51
  • GDB works, although there are nicer front-ends and mods for it if you want to get fancy. See the bottom of https://stackoverflow.com/tags/x86/info for asm debugging tips. It's essential that you use some debugger, though, otherwise you're wasting huge amounts of time on things that become obvious when you single-step and watch register values change. – Peter Cordes Jul 30 '22 at 20:55

0 Answers0