1

I am trying to understand how function calling works at machine level. For that I created a dummy C++ function like this :

int func(int a[], int n) {
    char arr[n];
    arr[n - 99] = 100; //to prevent compiler from optimizing my function to noop
    return arr[12]; //to prevent compiler from optimizing my function to noop
}

And, I got following assembly when compiling with x86-64 clang 13.0.1

push    rbp
mov     rbp, rsp
mov     eax, esi
mov     rcx, rsp
add     rax, 15 // Instruction 1
and     rax, -16 // Instruction. 2
mov     rdx, rcx
sub     rdx, rax
mov     rsp, rdx
movsxd  rsi, esi
mov     byte ptr [rsi + rdx - 99], 100
neg     rax
movsx   eax, byte ptr [rcx + rax + 12]
mov     rsp, rbp
pop     rbp
ret

Instructions 1 and 2 are calculating size of array arr from variable n. The calculated size is next largest multiple of 16 than n. For example, even if n = 1, clang is allocating 16 bytes of memory. I am confused as to why would clang try to make array sizes as multiple of 16 bytes?

I checked with different data types for arr but all have same behavior. So, I don't think this is for alignment purposes as different data type would have different alignment size.

I checked with different compilers as well. All are allocating arrays of sizes multiple of 16. There is something fundamental that I am missing.

Here is the link as reference.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
driewguy
  • 35
  • 3
  • 1
    Performance optimization (memory access). BTW, variable array sizes are not legal c++. – doug Mar 25 '22 at 01:27
  • @doug: Not in pure ISO C++, but clang implements the GNU dialect of C++ where they are available as an extension. https://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html. That's why this compiles as C++ not just C, as long as you don't use `-Wpedantic -Werror` https://godbolt.org/z/59n8ThE3b. (Even `-std=c++17` to override the default `-std=gnu++14` doesn't break it.) – Peter Cordes Mar 25 '22 at 01:51
  • @PeterCordes I never understood why GCC doesn't disable extensions to the ISO C++ when `-std=c++XX` is explicitly requested. Instead it needs to be combined with `-pedantic-errors` (or `-Wpedantic -Werror` as you mentioned). – user17732522 Mar 25 '22 at 01:57
  • @user17732522: It does keep the namespace clean, disabling keywords like `asm`, leaving only `__asm__`. IDK how useful it is to fully disable extensions; normally you don't want to disable `__asm__`, and disabling `__attribute__((__vector_size__(16)))` would break `immintrin.h`. (Of course just disabling the `vector_size` not `__vector_size__` and so on would let headers keep working.) As far as other extensions, IDK how you'd go about disabling union type-punning; add some extra code into the compiler to on-purpose break such code? – Peter Cordes Mar 25 '22 at 02:02
  • 1
    @PeterCordes What I meant was simply to put out diagnostics as the standard requires when `-std=c++XX` is given. Using `__asm__`, `__attribute__`, or union type-punning wouldn't cause a diagnostic to become required according to the standard anyway. I just think the way the compiler options are chosen is a bit weird. – user17732522 Mar 25 '22 at 02:23

1 Answers1

3

The System V psABI for x86-64 requires sufficiently large arrays and variable-length arrays to be aligned to at least 16 bytes so that they are correctly aligned for SSE operations.

As @PeterCordes explains under this answer, there are also further reasons to keep the stack aligned to 16, although that wouldn't matter for the specific function you are giving as an example.

Note however that variable-length arrays are only supported in C++ as compiler-specific extension in the first place. They are not allowed in standard ISO C++, but are supported in C since C99.

For reference you can find links to the ABI specification (draft) here. The requirement is given in section 3.1.2 under the heading "Aggregates and Unions".

user17732522
  • 53,019
  • 2
  • 56
  • 105
  • 2
    Also, moving RSP by a multiple of 16 keeps it aligned for future use. (e.g. function calls or further VLAs / allocas). But there aren't any in this function so this is the only justification that's actually fully required, rather than just "how compilers do it". Of course, the ABI having a say in private internals of a function's stack-frame is a bit of an over-reach; that might be more a case of documenting what GCC does that slipped into the standard. – Peter Cordes Mar 25 '22 at 01:45
  • 2
    re: further reasons for caring about 16-byte stack alignment in general: [Why does the x86-64 / AMD64 System V ABI mandate a 16 byte stack alignment?](https://stackoverflow.com/q/49391001) (stack pointer before a `call`, and thus of the first stack arg if any.) – Peter Cordes Mar 25 '22 at 02:30