Understanding and analyzing assembly code

Question

can anyone help me understand this assembly code? I'm totally new to the assembly language and I just can't figure it out... The following assembly code should produce this function:

func(int a) { return a * 34 }

The comments // are my thoughts what it should mean, please correct me if I'm wrong

//esp = stack-pointer, ebp = callee saved, eax = return value

pushl %ebp                   // a is pushed on stack
movl %esp,%ebp               // a = stackpointer
movl 8(%ebp),%eax            // eax = M(8 + a).But what is in M(8 + a)?
sall $4,%eax                 // eax << 4
addl 8(%ebp),%eax            // eax = M(8 + a)
addl %eax,%eax               // eax = eax + eax
movl %ebp,%esp               // eax = t
popl %ebp                    // pop a from stack
ret

Could someone please explain me how to figure this out? Thanks a lot!

The first two lines have nothing to do with `a`. They just set up the stack frame. `8(%ebp)` is `a`. — Jester, Dec 30 '18 at 22:36

score 7 · Accepted Answer · 2018-12-30T23:29:36.617

pushl %ebp                   // a is pushed on stack
movl %esp,%ebp               // a = stackpointer

As noted in a comment, ebp has nothing to do with a. ebp is the stack base pointer -- this code saves the old value of ebp to the stack, then saves the stack pointer in ebp.

movl 8(%ebp),%eax            // eax = M(8 + a).But what is in M(8 + a)?

Correct. What's on the stack is the input value of eax.

sall $4,%eax                 // eax << 4

Correct. (And the result is assigned back to eax.)

addl 8(%ebp),%eax            // eax = M(8 + a)

No, you've misunderstood this. This adds the value on the stack at 8(ebp) -- which is the original value of a -- to eax. The addition is applied to the values, not memory addresses.

addl %eax,%eax               // eax = eax + eax

Correct. The value of eax is not modified beyond here, so this is the return value of the function.

movl %ebp,%esp               // eax = t
popl %ebp                    // pop a from stack
ret

This code reverses the effects of the first two instructions. It's a standard cleanup sequence, and has nothing to do with a.

The important parts of this function can be glossed as:

a1 = a << 4;   // = a * 16
a2 = a1 + a;   // = a * 17
a3 = a2 + a2;  // = a * 34
return a3;

*"You've got this backwards. This moves eax (which carries the original value of a) to the stack."* no, it's AT&T syntax, so you have got it backward... it loads value from stack memory into `eax`, i.e. truly loads `a` into `eax`. ... the `a` is sent on stack in the OP's calling convention, and `eax` can contain pretty much anything upon entry. — Ped7g, Dec 30 '18 at 23:01
@Ped7g Ah, you're right -- corrected my post. I was thinking of x86-64 calling conventions. — , Dec 30 '18 at 23:29

score 2 · Answer 2 · answered Dec 31 '18 at 07:12

This is craptastic un-optimized code because you compiled with -O0 (compile fast, skip most optimization passes). The legacy stack-frame setup / cleanup is just noise. The arg is on the stack right above the return address, i.e. at 4(%esp) on function entry. (See also How to remove "noise" from GCC/clang assembly output?)

It's surprising to see a compiler use 3 instructions to multiply by shifting and adding, instead of an imull $34, 4(%esp), %eax / ret, unless tuning for old CPUs. 2 instructions is the cutoff for modern gcc and clang with their default tuning. See for example How to multiply a register by 37 using only 2 consecutive leal instructions in x86?

But this can be done with 2 instructions using LEA (not counting mov to copy a register); the code is bloated because you compiled without optimization. (Or you tuned for an old CPU where there's maybe some reason to avoid LEA.)

I think you must have used gcc for this; disabling optimization with other compilers always just uses imul to multiply by a non-power-of-2. But I can't find a gcc version + options on the Godbolt compiler explorer that gives exactly your code. I didn't try every possible combination. MSVC 19.10 -O2 uses the same algorithm as your code, including loading a twice.

Compiling with gcc5.5 (which is the newest gcc that doesn't just use imul, even at -O0), we get something like your code, but not exactly. (Same operations in a different order, and not loading a from memory twice).

# gcc5.5 -m32 -xc -O0 -fverbose-asm -Wall
func:
    pushl   %ebp  #
    movl    %esp, %ebp      #,            # make a stack frame

    movl    8(%ebp), %eax   # a, tmp89    # load a from the stack, first arg is at EBP+8

    addl    %eax, %eax      # tmp91          # a*2
    movl    %eax, %edx      # tmp90, tmp92
    sall    $4, %edx        #, tmp92         # a*2 << 4 = a*32
    addl    %edx, %eax      # tmp92, D.1807  # a*2 + a*32

    popl    %ebp    #                     # clean up the stack frame
    ret

Compiling with optimization with the same older GCC version on the Godbolt compiler explorer: gcc5.5 -m32 -O3 -fverbose-asm, we get:

# gcc5.5 -m32 -O3.   Also clang7.0 -m32 -O3 emits the same code
func:
    movl    4(%esp), %eax   # a, a          # load a from the stack
    movl    %eax, %edx      # a, tmp93      # copy it to edx
    sall    $5, %edx        #, tmp93        # edx = a<<5 = a*32
    leal    (%edx,%eax,2), %eax             # eax = edx + eax*2 = a*32 + a*2 = a*34
    ret              # with a*34 in EAX, the return-value reg in this calling convention

With gcc 6.x or newer, we get this efficient asm: imul-immediate with a memory source decodes to only a single micro-fused uop on modern Intel CPUs, and integer multiply only has 3 cycle latency on Intel since Core2 and AMD since Ryzen. (https://agner.org/optimize/).

# gcc6/7/8 -m32 -O3     default tuning
func:
    imull   $34, 4(%esp), %eax    #, a, tmp89
    ret

But with -mtune=pentium3, we strangely don't get an LEA. This looks like a missed optimization. LEA has 1-cycle latency on Pentium 3 / Pentium-M.

# gcc8.2 -O3 -mtune=pentium3 -m32 -xc -fverbose-asm -Wall
func:
    movl    4(%esp), %edx   # a, a
    movl    %edx, %eax      # a, tmp91
    sall    $4, %eax        #, tmp91     # a*16
    addl    %edx, %eax      # a, tmp92   # a*16 + a = a*17
    addl    %eax, %eax      # tmp93      # a*16 * 2 = a*34
    ret

This is the same as your code, but uses a reg-reg mov instead of reloading from the stack to add a to the shift result.

Understanding and analyzing assembly code

2 Answers2