2

I wrote a very simple C++ function in VS2019 Community Edition and I have a question in corresponding disassembly.

Function:

void manip(char* a, char* b, long* mc) {
    long lena = 0;
    long lenb = 0;
    long lmc;
    long i, j;
    for (; a[lena] != NULL; lena++);
    for (; b[lenb] != NULL; lenb++);
    lmc = lena + lenb + *mc;
    for (i=0; i < lena; i++) a[lena] = a[lena] + lmc;
    for (j=0; j < lenb; j++) b[lenb] = b[lenb] + lmc;
}

Disassembly (Excerpt):

void manip(char* a, char* b, long* mc) {
00007FF720DE1910  mov         qword ptr [rsp+18h],r8  
00007FF720DE1915  mov         qword ptr [rsp+10h],rdx  
00007FF720DE191A  mov         qword ptr [rsp+8],rcx  
00007FF720DE191F  push        rbp  
00007FF720DE1920  push        rdi  
00007FF720DE1921  sub         rsp,188h  
00007FF720DE1928  lea         rbp,[rsp+20h]  
00007FF720DE192D  mov         rdi,rsp  
00007FF720DE1930  mov         ecx,62h  
00007FF720DE1935  mov         eax,0CCCCCCCCh  
00007FF720DE193A  rep stos    dword ptr [rdi]  

In the first three lines we are placing the arguments in stack before the frame pointer. The frame rbp pointer is pushed after that. What troubles me are following three lines :

00007FF720DE1921  sub         rsp,188h  
00007FF720DE1928  lea         rbp,[rsp+20h]  
00007FF720DE192D  mov         rdi,rsp

Of the three lines above, the first one as I understand reserves the space on the stack.

Questions:

  1. I do not understand why this huge space (188h) is reserved while we need just enough to save 5 longs, which are no more than 5*4=20 (16h) bytes.
  2. Second line is calculation of new frame pointer, but I don't understand how did we get 20h(32).
  3. I also don't get the significance of 3rd line.
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
ultimate cause
  • 2,264
  • 4
  • 27
  • 44
  • 2
    This is MSVC debug mode; it reserves extra space and poisons it (with `rep stosb`, hence the `mov rdi,rsp`) to help detect out-of-bounds access errors. (Even though none of the locals have their address taken...) – Peter Cordes Apr 10 '21 at 07:19
  • Also note that `a[lena] != NULL` is not appropriate for C or C++. `NULL` is a pointer constant, not a character. It often happens to be a macro `#define` for `0`, rather than `((void*)0)`, which is why it works for comparing against a `char`, but you're mixing it up with ASCII NUL, `'\0'`. – Peter Cordes Apr 10 '21 at 07:21
  • @PeterCordes Agree, this snippet is a dummy code. Your other comment, seems much useful.Unlucky me, in Release mode, the function was inlined and I cannot confirm this. You may please post this as answer and I will accept it. Also if possible, please help with other queries. – ultimate cause Apr 10 '21 at 08:15
  • 1
    If you don't include a caller, in your program, there's nothing for it to inline into. It's not `static` so there should still have been a stand-alone definition anyway. https://godbolt.org/z/GfEo8jMGq shows `-O2` output. See [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116) for examples of writing functions that compile to interesting asm. The parts about optimization and args applies in general, although of course the option names are specific to GCC. – Peter Cordes Apr 10 '21 at 08:21
  • @PeterCordes : Again I agree. I posted just an excerpt not the caller. Now I tried to disable optimizations and inlining and it yields exactly what you said. The reservation has reduced to 40 bytes. Additionally, it has omitted the frame pointer rbp. Thanks for the help. – ultimate cause Apr 10 '21 at 08:27
  • The large stack frame allows adding local variables while debugging, an Edit+Continue feature. Looking at debug-built code isn't that useful. Consider `__declspec(noinline)` if you want to see the optimized code. – Hans Passant Apr 10 '21 at 12:16

1 Answers1

3

This is MSVC debug mode; it reserves extra space and poisons it with 0xCC (with rep stosd aka memset in a can, hence the mov rdi,rsp to set the destination) to help detect out-of-bounds access errors. (Even though none of the locals have their address taken and none are arrays...)

It's a surprising amount of extra stack space; I don't know how MSVC chooses how much to reserve. In Release mode (-O2 optimization https://godbolt.org/z/GY7xTYWKq), it of course doesn't touch the stack at all.

Debug mode must be adding some extra options that aren't default for MSVC's command line, because I can't reproduce this code-gen on https://godbolt.org/z/nGo9516b7 with MSVC2015 19.10 or 19.28. I just get sub rsp, 40 after spilling the incoming register args to the shadow space, and not even setting up RBP as a frame pointer. (I guess because it's a leaf function.)

lea rbp,[rsp+20h] seems to be setting up RBP as a presumably frame pointer, but it's not point at the saved RBP right below the return address. With some code showing how it uses it, maybe we could figure that out. (Look at its asm output, not disassembly, so you can get symbolic names for local vars).


And BTW, the optimized asm for the loops is much more readable if you want to actually see how the loop logic works.

Your code is full of reloads of the pointers from the stack, and movsxd sign-extension because you used signed integers as array indexes that weren't pointer-width, and the compiler didn't optimize into pointer-increments or at least into 64-bit integers. (Signed-overflow being UB allows this optimization.)

Much of How to remove "noise" from GCC/clang assembly output? applies in terms of writing functions that are interesting to look at when optimized.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847