9

Inspired by this question.

Apparently in the following code:

#include <Windows.h>

int _tmain(int argc, _TCHAR* argv[])
{
    if( GetTickCount() > 1 ) {
        char buffer[500 * 1024];
        SecureZeroMemory( buffer, sizeof( buffer ) );
    } else {
        char buffer[700 * 1024];
        SecureZeroMemory( buffer, sizeof( buffer ) );
    }
    return 0;
}

compiled with default stack size (1 megabyte) with Visual C++ 10 with optimizations on (/O2) a stack overflow occurs because the program tries to allocate 1200 kilobytes on stack.

The code above is of course slightly exaggerated to show the problem - uses lots of stack in a rather dumb way. Yet in real scenarios stack size can be smaller (like 256 kilobytes) and there could be more branches with smaller objects that would induce a total allocation size enough to overflow the stack.

That makes no sense. The worst case would be 700 kilobytes - it would be the codepath that constructs the set of local variables with the largest total size along the way. Detecting that path during compilation should not be a problem.

So the compiler produces a program that tries to allocate even more memory than the worst case. According to this answer LLVM does the same.

That could be a deficiency in the compiler or there could be some real reason for doing it this way. I mean maybe I just don't understand something in compilers design that would explain why doing allocation this way is necessary.

Why would the compiler want a program allocate more memory than the code needs in the worst case?

Community
  • 1
  • 1
sharptooth
  • 167,383
  • 100
  • 513
  • 979
  • Because the compiler is not smart enough to figure out the worst case? – R. Martinho Fernandes Aug 17 '11 at 09:57
  • @R. Martinho Fernandes: Maybe, I called it *a deficiency in the compiler* in the question. – sharptooth Aug 17 '11 at 09:58
  • 1
    Oh. I interpreted that to mean a buggy compiler. – R. Martinho Fernandes Aug 17 '11 at 10:00
  • 1
    Looks like it's just an optimiser deficiency. Why are you allocating large buffers on the stack? That seems silly; perhaps that's why the devs didn't spend their time micro-optimising it :) – Lightness Races in Orbit Aug 17 '11 at 10:03
  • @Tomalak Geret'kal: You don't see the big picture. Suppose you have a large `switch` or a huge block of `if-else-is` or anything like that. You might have a rather small object in each branch, but with this compiler strategy their sizes accumulate and that causes a stack overflow. – sharptooth Aug 17 '11 at 10:07
  • 1
    To be honest, I think that the question you linked to answers your question in entirety. It's clear from the answers there that there is no particularly good reason for this behaviour, and that it's down to how much work the compiler wants to do. – Lightness Races in Orbit Aug 17 '11 at 10:08
  • 1
    @sharptooth: You'd need a lot of small objects to hit 1M; if this is an issue, your function is far too big. Don't allocate large buffers on stack: _that_ is your "big picture". – Lightness Races in Orbit Aug 17 '11 at 10:09
  • @Tomalak Geret'kal: In real life you could have much smaller stack limit and objects with size like 50 kilobytes. That's the case where we found this problem. In fact, even smaller objects can cause the problem if there's a lot of them. – sharptooth Aug 17 '11 at 10:10
  • 3
    @sharptooth: Extreme edge case. Your function is too big. Decompose your program into smaller functions, and stop allocating large buffers on the stack. The stack is not meant for storing large objects. (And, please, there is no reason at all to be patronising. Just because we disagree doesn't mean I don't know about "real life".) – Lightness Races in Orbit Aug 17 '11 at 10:11
  • @Tomalak Geret'kal: Okay, I agree that it's an edge case. I never intended to mean you don't know real life. – sharptooth Aug 17 '11 at 10:17
  • I'm going to get some more downvotes here, but - it is clearly a case for using `alloca()` (ok, ok, if you're lucky enough to have a decent C99 compiler - then use a variable sized array). – SK-logic Aug 17 '11 at 10:25
  • @@Tomalak Geret'kal: Btw that strategy can lead to much worse cache locality in case of even small objects. I guess it's not very good. – sharptooth Aug 17 '11 at 10:25
  • @SK-logic: `alloca()` is only good for PODs, isn't it? – sharptooth Aug 17 '11 at 10:26
  • Looking at the assembly output of gcc, it is sometimes willing to do the optimization – PlasmaHH Aug 17 '11 at 10:28
  • @PlasmaHH: I tried codepad.org - the behavior is the same, just the stack size limit is higher. Did you include equivalent code that would prevent eliminating the branches and the allocations? – sharptooth Aug 17 '11 at 10:29
  • @sharptooth: Yes, I used rand(), on ideone it works nicely: http://ideone.com/vHLwP looking at the generated assembler code it is reserving 2068 bytes on the stack. Have you checked that you are not compiling with any stack smashing protection options enabled that could influence the stackk allocation behaviour? – PlasmaHH Aug 17 '11 at 10:36
  • @PlasmaHH: No, that won't do - you can't know for sure how much was really allocated this way specifically because the order of allocations is unspecified. My strategy was to deduct maximum stack size before the program crashes and then construct an `if-else` with one allocation in each branch. That was 150 megabytes on codepad.org - one allocation wouldn't crash, but two branches with 120/150 would. – sharptooth Aug 17 '11 at 10:41
  • @PlasmaHH: Here it is http://ideone.com/f7iY2 - when the first two lines are uncommented and all others are commented it runs fine. – sharptooth Aug 17 '11 at 10:47
  • @sharptooth: That will do; at least on linux since the stack grows automatically there, looking at the assembly output will suffice, and there you can see that the framesize is adjusted to be 2068 bytes – PlasmaHH Aug 17 '11 at 11:34

4 Answers4

2

I can only speculate that this optimization was deemed too unimportant by the compiler designers. Or perhaps, there is some subtle security reason.

BTW, on Windows, stack is reserved in its entirety when the thread starts execution, but is committed on as-needed basis, so you are not really spending much "real" memory even if you reserved a large stack.

Reserving a large stack can be a problem on 32-bit system, where having large number of threads can eat the available address space without really committing much memory. On 64-bit, you are golden.

Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
2

It could be down to your use of SecureZeroMemory. Try replacing it with regular ZeroMemory and see what happens- the MSDN page essentially indicates that SZM has some additional semantics beyond what it's signature implies, and they could be the cause of the bug.

Puppy
  • 144,682
  • 38
  • 256
  • 465
  • The only special thing with `SecureZeroMemory()` is that its contents is not shown to the compiler. If I replace it with `ZeroMemory()` the allocation just gets optimized away. Also the behavior is exactly the same when I use `memset()` followed by `printf()` on codepad.org and ideone.com – sharptooth Aug 17 '11 at 11:29
  • SecureZeroMemory not being optimized away shouldn't affect the compiler's ability to overlap the storage allocations. In both branches, the storage is securely zeroed. Nobody should care what happens after that, and the if-then-else cases are mutually exclusive by definition. Looks to me like a dumb compiler. – Ira Baxter Aug 17 '11 at 14:56
  • 1
    @Bo Persson: Whatever - equivalent code without `SecureZeroMemory()` shows the same behavior on codepad.org/ideone.com – sharptooth Aug 17 '11 at 14:56
1

The following code when compiled using GCC 4.5.1 on ideone places the two arrays at the same address:

#include <iostream>

int main()
{
  int x;
  std::cin >> x;

  if (x % 2 == 0)
  {
    char buffer[500 * 1024]; 
    std::cout << static_cast<void*>(buffer) << std::endl;
  }

  if (x % 3 == 0)
  {
    char buffer[700 * 1024]; 
    std::cout << static_cast<void*>(buffer) << std::endl;
  }
}

input: 6

output:
0xbf8e9b1c
0xbf8e9b1c

The answer is probably "use another compiler" if you want this optimization.

Clinton
  • 22,361
  • 15
  • 67
  • 163
-3

OS Pageing and byte alignment could be a factor. Also housekeeping may use extra stack along with space required for calling other functions within that function.

Ed Heal
  • 59,252
  • 17
  • 87
  • 127