1

Having this simple c:

#include <stdio.h>

struct foo{
    int a;
    char c;
};

static struct foo save_foo;

int main(){
    struct foo foo = { 97, 'c', };
    save_foo = foo;

    printf("%c\n",save_foo.c);
}

Here the save_foo variable is in bss segmet, and in the main function, I am trying to "copy" from stack-made variable foo to uninitialized save_foo. So I would expect both elements foo.a and foo.c to be copied into save_foo.a and save_foo.c.

However, the generated assembly:

.text
    .local  save_foo
    .comm   save_foo,8,8
    .section    .rodata
.LC0:
    .string "%c\n"
    .text
    .globl  main
    .type   main, @function
main:
    endbr64 
    pushq   %rbp    #
    movq    %rsp, %rbp  #,
    subq    $16, %rsp   #,
# a.c:11:   struct foo foo = { 97, 'c', };
    movl    $97, -8(%rbp)   #, foo.a
    movb    $99, -4(%rbp)   #, foo.c
# a.c:12:   save_foo = foo;
    movq    -8(%rbp), %rax  # foo, tmp86

##################################################################
    #MISSING to copy foo.c to save_foo.c yet able to use that value
    
     #movq -4(%rbp), %rcx
     #movq  %rcx, 4+save_foo(%rip)
##################################################################

    movq    %rax, save_foo(%rip)    # tmp86, save_foo
# a.c:14:   printf("%c\n",save_foo.c);
    movzbl  4+save_foo(%rip), %eax  # save_foo.c, _1
# a.c:14:   printf("%c\n",save_foo.c);
    movsbl  %al, %eax   # _1, _2
    movl    %eax, %esi  # _2,
    leaq    .LC0(%rip), %rdi    #,
    movl    $0, %eax    #,
    call    printf@PLT  #
    movl    $0, %eax    #, _9
# a.c:15: }
    leave   
    ret 
    .size   main, .-main
    .ident  "GCC: (Ubuntu 10.2.0-13ubuntu1) 10.2.0"
    .section    .note.GNU-stack,"",@progbits
    .section    .note.gnu.property,"a"
    .align 8
    .long    1f - 0f
    .long    4f - 1f
    .long    5
0:
    .string  "GNU"
1:
    .align 8
    .long    0xc0000002
    .long    3f - 2f
2:
    .long    0x3
3:
    .align 8
4:

There is only one element (foo.a) copied. But the foo.c is not. How is possible for movzbl 4+save_foo(%rip), %eax to get the right value (99, which is in ASCII 'c'), when that value was not copied? (there is no movl from -4(%rbp) where the value is to 4+save_foo(%rbp) symbol on the bss segment). Shouldn't be the value at 4+save_foo(%rbp) zeroed (when it is uninitialized)?

mediocrevegetable1
  • 4,086
  • 1
  • 11
  • 33
milanHrabos
  • 2,010
  • 3
  • 11
  • 45
  • 1
    `4+save_foo(%rbp)` looks like a typo for `4+save_foo(%rip)`. RBP is being used as a frame pointer here (because of an unoptimized build), and `save_foo` is a label in BSS, not a `.equ` small integer offset relative into a stack frame. – Peter Cordes Mar 19 '21 at 13:04

2 Answers2

3

movq instruction will copy 8 bytes, so the data of entire struct foo is copied here:

    movq    -8(%rbp), %rax  # foo, tmp86
    movq    %rax, save_foo(%rip)    # tmp86, save_foo
MikeCAT
  • 73,922
  • 11
  • 45
  • 70
1

movq -8(%rbp), %rax is an 8-byte reload of the whole struct. Note the l vs. q operand-size suffixes, as well as the register names which also indicate operand-size. (Assembly registers in 64-bit architecture)

When you ask GCC to copy a whole object by doing C struct assignment, it uses wider regs, up to 16-byte XMM regs, just like for memcpy. (Or for large-enough things, might insert a call memcpy instead of expanding it inline.)

Your proposed movq %rcx, 4+save_foo(%rip) would store 8 bytes, starting half way through the global, so it would write outside it.

If you wanted to do both halves separately like save_foo.a = foo.a; save_foo.c = foo.c;, you'd use %eax and %ecx, or %ecx twice. (With movl, not movq). Or maybe a movzbl byte load and a movb byte or movl dword store, depending on whether GCC chose to overwrite the padding or not in the destination, like it does when copying the whole object.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • `movb` with `%cl` should be better instead of `movl` with `%ecx` because `foo.c` is `char` (typically 1-byte long) – MikeCAT Mar 19 '21 at 12:43
  • so if the struct were bigger (say `sizeof(64)`), then multiple registers, holding each elements of that struct would be involved? Or would it be copying from one stack area to another? – milanHrabos Mar 19 '21 at 12:43
  • 2
    @milanHrabos [Let's try](https://gcc.godbolt.org/z/q4zx5E). – MikeCAT Mar 19 '21 at 12:46
  • @MikeCAT basically only `rax` and `rdx` were involved in copying. thanks anyway – milanHrabos Mar 19 '21 at 12:49
  • 1
    @MikeCAT: Thanks, yeah, GCC uses a `movb` stores when only writing the `char` member. But it uses movzbl as a byte load because it doesn't have any need to merge into the low byte of RAX. https://gcc.godbolt.org/z/Gjz3Kf You almost never want to do `movb (mem), %al`, except when optimizing for code-size over speed, or when you *want* to do an 8-bit bitfield insert. – Peter Cordes Mar 19 '21 at 12:54
  • 2
    @milanHrabos: yup, and if you enable optimization (and make the huge global volatile), you'll see GCC use `movdqu` with XMM regs, doing memcpy like I said in my answer. https://gcc.godbolt.org/z/WvG4Wf. At -O3 it copies from .rodata to the stack with XMM, and then copies again to .bss, but at -O2 it uses `mov $imm64, %rax` and qword stores to store pairs of ints, then reloads with movdqu, only finally storing with movaps. (It's generally good to do a few loads then a few stores, although modern x86 CPUs are all out-of-order exec so they can hide that load latency themselves.) – Peter Cordes Mar 19 '21 at 13:00
  • 2
    Also, in the first Godbolt link in my reply to MikeCAT, you can how GCC implements `save_foo.a = foo.a; save_foo.c = foo.c;` at -O0. Yeah, it's only ever going to use RAX unless it needs more than 1 scratch reg within one statement. Remember, un-optimized mode is braindead compile-fast without even bothering to do register allocation for the whole function, and compiling each statement to a separate block of asm ([for consistent debugging](https://stackoverflow.com/questions/53366394/why-does-clang-produce-inefficient-asm-with-o0-for-this-simple-floating-point)) – Peter Cordes Mar 19 '21 at 13:02