What is GCC's un-optimized asm doing to add two short ints in assembly?

Question

In this code example

typedef struct { short int a1; short int a2;} st1;
typedef struct { int sum; int diff;} st2;

st2 test (st1 s1) {
    st2 store;
    store.sum = s1.a1 + s1.a2;
    store.diff = s1.a1 - s1.a2;
    return store;
}

which has the following machine code in assembly

test:
1 endbr32
2 pushl %ebp
3 movl %esp, %ebp
4 subl $16, %esp
5 movzwl 12(%ebp), %eax
6 movswl %ax, %edx
7 movzwl 14(%ebp), %eax
8 cwtl
9 addl %edx, %eax
10 movl %eax, -8(%ebp)
11 movzwl 12(%ebp), %eax
12 movswl %ax, %edx
13 movzwl 14(%ebp), %eax
14 cwtl
15 subl %eax, %edx
16 movl %edx, %eax
17 movl %eax, -4(%ebp)
18 movl 8(%ebp), %ecx
19 movl -8(%ebp), %eax
20 movl -4(%ebp), %edx
21 movl %eax, (%ecx)
22 movl %edx, 4(%ecx)
23 movl 8(%ebp), %eax
24 leave
25 ret $4

I don't understand what's going on in lines 5 to 8 and then 10 to 14. I imagine it's a way to manipulate short int, but I'd like to better understand what each instruction is doing when compared to its C code

It's over-complicated because you told GCC not to optimize. It's first loading a 16-bit integer into EAX using `movzx` (movzwl) to avoid false dependencies, and *then* it's remembering that it needs to sign-extend it to 32-bit, so it uses `movswl %ax, %edx` to actually copy to where it really wants it, after using EAX as a temporary. Put your code on https://godbolt.org/ and use `-fverbose-asm` to see which asm lines match which source lines. And see [How to remove "noise" from GCC/clang assembly output?](//stackoverflow.com/q/38552116) - make a func that's interesting with optimization. — Peter Cordes, Sep 09 '21 at 00:43
Partly a duplicate of [Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?](https://stackoverflow.com/q/53366394) and [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116), unless you actually *want* to ask about why un-optimized GCC first does zero-extending loads into an EAX temporary before sign-extending into the right register. i.e. details of compiler behaviour, not the way you'd actually just movswl (MOVSX) load the integers from memory and then LEA to add them without destroying either sign-extension result, the — Peter Cordes, Sep 09 '21 at 00:50
Should only take 4 instructions to get both the sum and difference in registers. In a calling convention that returned 64-bit structs in EDX:EAX, you'd be done, just a `ret` needed, otherwise you need to load a pointer and do 2 stores. — Peter Cordes, Sep 09 '21 at 00:51
Or maybe you're wondering why there's sign-extension before the add/sub? In C, narrow types implicitly convert to `int` when used as operands to binary operators like `+`. So if you'd done `long long sum = int + int`, the addition result would still be an `int`, and promotion to the wider result type would happen after. That would signed overflow possible for the add (which is undefined behaviour in the C abstract machine). A 64-bit compiler might use sign-extending loads anyway, instead of doing sign-extension after an add, because UB means it doesn't have to implement wrapping. — Peter Cordes, Sep 09 '21 at 00:55
@paul, as someone who prefers gas, you should know that `movl %eax, -8(%ebp)` **stores** eax to memory. — prl, Sep 09 '21 at 04:31
It uses two different instructions to sign extend the two "short" fields. `movswl` sign extends and puts the result into a different register, while `cwtl` sign extends ax into eax. — prl, Sep 09 '21 at 04:35

What is GCC's un-optimized asm doing to add two short ints in assembly?

0 Answers0