0

I've written the following basic C program:

int main() {
    char a = 1;
    char b = 5;
    return a + b;
}

And it compiles in godbolt as:

main:
  pushq %rbp
  movq %rsp, %rbp
  movb $1, -1(%rbp)
  movb $5, -2(%rbp)
  movsbl -1(%rbp), %edx
  movsbl -2(%rbp), %eax
  addl %edx, %eax
  popq %rbp
  ret

I have a few questions about the compiled asm:

  • Is movb used for 1byte (char), movw for 2byte (short), movl for 4byte (int), and movq for 8byte (int) integers? What then is just mov used for, without an extension?
  • Why is an offset used for movb $1 -1(%rbp), movb $5 -2(%rbp)? Why aren't the two numbers just moved into two different registers? For example, there's an addl %edx, %eax later on...why aren't the two numbers just moved into those two registers?
  • Why is movsbl used here? Why aren't the numbers just moved directly into the registers?
  • Is pushq / popq pushing/popping an 8byte pointer onto the stack? If so, what's the point of the movq %rsp, %rbp?
David542
  • 104,438
  • 178
  • 489
  • 842
  • Also it's worth turning on optimization. Then the result will be `main: movl $6, %eax; ret` – MikeCAT Aug 02 '20 at 01:21
  • @MikeCAT -- thanks, why `movl` instead of `movb` if the number is 6? – David542 Aug 02 '20 at 01:23
  • 4
    Because the return value is an `int`. It would be an error to leave the high 3 byte of EAX holding whatever garbage main's caller left there. – Peter Cordes Aug 02 '20 at 01:26
  • As for why store to memory them movsbl: because they're separate statements and you compiled without optimization. [Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?](https://stackoverflow.com/q/53366394) – Peter Cordes Aug 02 '20 at 01:28
  • @PeterCordes thanks, what is the "l" (lowercase L) in `movl` stand for? I would think it is long, but long is 8 bytes (or maybe that's `long long`)? – David542 Aug 02 '20 at 01:29
  • 1
    *What then is just mov used for, without an extension?* - nothing. Instructions always has an operand-size. Compilers choose to make that explicit. But you can omit the AT&T operand-size suffix when it's implied by a register operand, like in all of these cases. For `mov` it's only necessary for `mov $imm, (mem)` where neither operand is a register; with no suffix the size is ambiguous. – Peter Cordes Aug 02 '20 at 01:30
  • @David542: asm names for widths were established back in 32-bit days, when 386 was new. asm `long` is a dword, C `int32_t`, and what used to be C `long` in `gcc -m32` 32-bit mode. – Peter Cordes Aug 02 '20 at 01:32
  • @PeterCordes -- I see, thanks. And then `q` is quadWord? = 8 bytes? – David542 Aug 02 '20 at 01:34
  • This is a duplicate of like 3 or 4 different questions. Read some tutorials and/or the GAS manual, but avoid asking 3 unrelated questions in one post, even if they're about the same code. (And yes, of course `q` is quadword, that's why GCC is using it on instructions with 64-bit register operands. Note that `int` is not 8 bytes, that's `long` or `long long`, or `void*`) – Peter Cordes Aug 02 '20 at 01:34
  • 1
    More duplicates that SO's dup list didn't have room for: [Questions about AT&T x86 Syntax design](https://stackoverflow.com/q/4193827) / [What is the purpose of the RBP register in x86\_64 assembler?](https://stackoverflow.com/a/41914096) (RBP as a frame pointer) / [Why does %rbp point to nothing?](https://stackoverflow.com/q/44687662) – Peter Cordes Aug 02 '20 at 01:39
  • If there's anything that's not a duplicate of those links, I guess ask a new question if you can't answer it yourself with some research. – Peter Cordes Aug 02 '20 at 01:40

0 Answers0