26

I'm debugging a rather weird stack overflow supposedly caused by allocating too large variables on stack and I'd like to clarify the following.

Suppose I have the following function:

void function()
{
    char buffer[1 * 1024];
    if( condition ) {
       char buffer[1 * 1024];
       doSomething( buffer, sizeof( buffer ) );
    } else {
       char buffer[512 * 1024];
       doSomething( buffer, sizeof( buffer ) );
    }
 }

I understand, that it's compiler-dependent and also depends on what optimizer decides, but what is the typical strategy for allocating memory for those local variables?

Will the worst case (1 + 512 kilobytes) be allocated immediately once function is entered or will 1 kilobyte be allocated first, then depending on condition either 1 or 512 kilobytes be additionally allocated?

sharptooth
  • 167,383
  • 100
  • 513
  • 979
  • 4
    I think it's typical to allocate all possibly needed stack space all at once. – Gabe Aug 17 '11 at 07:12
  • In that case, it would probably be best to split this up into separate functions so each has it's own stack space and your main `function()` doesn't allocate it all at once. – Mike Bailey Aug 21 '11 at 19:26

5 Answers5

15

On many platforms/ABIs, the entire stackframe (including memory for every local variable) is allocated when you enter the function. On others, it's common to push/pop memory bit by bit, as it is needed.

Of course, in cases where the entire stackframe is allocated in one go, different compilers might still decide on different stack frame sizes. In your case, some compilers would miss an optimization opportunity, and allocate unique memory for every local variable, even the ones that are in different branches of the code (both the 1 * 1024 array and the 512 * 1024 one in your case), where a better optimizing compiler should only allocate the maximum memory required of any path through the function (the else path in your case, so allocating a 512kb block should be enough). If you want to know what your platform does, look at the disassembly.

But it wouldn't surprise me to see the entire chunk of memory allocated immediately.

jalf
  • 243,077
  • 51
  • 345
  • 550
  • 2
    I can confirm for Visual Studio at least in the case of exceptions, it allocates storage for all exceptions possibly thrown from a function when you enter it. Recursing such a function is hell for your stack. Try it out - make a recursive function that throws a 2k object in its 4000th recursion - it'll crash with a stack overflow. – dascandy Aug 17 '11 at 08:57
  • In general Visual C++ manages to re-use the same stack space for the case above with then/else. It still allocates the space in the function prologue, but at least doesn't allocate space for both buffers. If you really want to postpone allocation you should look at alloca(), or std::vector<>. –  Aug 18 '11 at 06:58
11

I checked on LLVM:

void doSomething(char*,char*);

void function(bool b)
{
    char b1[1 * 1024];
    if( b ) {
       char b2[1 * 1024];
       doSomething(b1, b2);
    } else {
       char b3[512 * 1024];
       doSomething(b1, b3);
    }
}

Yields:

; ModuleID = '/tmp/webcompile/_28066_0.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"

define void @_Z8functionb(i1 zeroext %b) {
entry:
  %b1 = alloca [1024 x i8], align 1               ; <[1024 x i8]*> [#uses=1]
  %b2 = alloca [1024 x i8], align 1               ; <[1024 x i8]*> [#uses=1]
  %b3 = alloca [524288 x i8], align 1            ; <[524288 x i8]*> [#uses=1]
  %arraydecay = getelementptr inbounds [1024 x i8]* %b1, i64 0, i64 0 ; <i8*> [#uses=2]
  br i1 %b, label %if.then, label %if.else

if.then:                                          ; preds = %entry
  %arraydecay2 = getelementptr inbounds [1024 x i8]* %b2, i64 0, i64 0 ; <i8*> [#uses=1]
  call void @_Z11doSomethingPcS_(i8* %arraydecay, i8* %arraydecay2)
  ret void

if.else:                                          ; preds = %entry
  %arraydecay6 = getelementptr inbounds [524288 x i8]* %b3, i64 0, i64 0 ; <i8*> [#uses=1]
  call void @_Z11doSomethingPcS_(i8* %arraydecay, i8* %arraydecay6)
  ret void
}

declare void @_Z11doSomethingPcS_(i8*, i8*)

You can see the 3 alloca at the top of the function.

I must admit I am slightly disappointed that b2 and b3 are not folded together in the IR, since only one of them will ever be used.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • 2
    Whoa, it's rather disappointing - it's even worse than the worst case I expected. – sharptooth Aug 17 '11 at 08:04
  • 2
    I just tested on Visual C++ 10 - the behavior is the same as you see on LLVM. Very sad. I couldn't expect allocation algorithm to be that bad. – sharptooth Aug 17 '11 at 09:10
  • @sharptooth: I just remember that it makes sense NOT to merge the variables in the LLVM IR as it facilitates analysis, I do not know if in the final assembly they will be merged, or not. I have asked about it on llvm-dev, I'll update the answer following the results. – Matthieu M. Aug 17 '11 at 12:07
10

This optimization is known as "stack coloring", because you're assigning multiple stack objects to the same address. This is an area that we know LLVM can improve. Currently LLVM only does this for stack objects created by the register allocator for spill slots. We'd like to extend this to handle user stack variables as well, but we need a way to capture the lifetime of the value in IR.

There is a rough sketch of how we plan to do this here: http://nondot.org/sabre/LLVMNotes/MemoryUseMarkers.txt

Implementation work on this is underway, several pieces are implemented in mainline.

-Chris

4

Your local (stack) variables are allocated in the same space as stack frames. When the function is called, the stack pointer is changed to "make room" for the stack frame. It's typically done in a single call. If you consume the stack with local variables, you'll encounter a stack overflow.

~512 kbytes is really too large for the stack in any case; you should allocate this on the heap using std::vector.

tenfour
  • 36,141
  • 15
  • 83
  • 142
0

As you say, it is compiler dependent, but you could consider using alloca to overcome this. The variables would still be allocated on the stack, and still automatically freed as they go out of scope, but you take control over when and if the stack space is allocated.

While use of alloca is typically discouraged, it does have its uses in situations such as the above.

Community
  • 1
  • 1
SmacL
  • 22,555
  • 12
  • 95
  • 149
  • 4
    Wouldn't a better option be to just put 512kb arrays on the heap instead? Even with alloca, you're using an awful lot of stack space. – jalf Aug 17 '11 at 09:04
  • @jalf, most probably for 512k. For a smaller amounts I could envisage it making sense for certain recursive situations, or performance critical situations where the heap can be slow. – SmacL Aug 17 '11 at 10:53
  • true in theory. In practice, for performance critical situations, I'd just preallocate those 512kb on the heap. ;) – jalf Aug 17 '11 at 10:56
  • Simplest workaround would be to put local variables into separate "sub-functions". A branch in the calling function would determine which sub-function needs to be called, and the stack pointer would be moved only by the amount needed by the local variable in that sub-function. – Branko Dimitrijevic Aug 17 '11 at 10:59
  • @jalf Heap allocation is far more expensive than stack "allocation". If you need multiple objects, you'll pay much higher price if you allocate them on heap. If you only need one object, just make it static. – Branko Dimitrijevic Aug 17 '11 at 11:02
  • Heap is not that much more expensive, and it must be weighed against the consequence of stack allocation causing an unpredictable, unrecoverable crash. – tenfour Aug 17 '11 at 13:07
  • ok, there are like 4 different, unrelated arguments going on here now. Heap vs stack, heap vs static, static vs stack. This is just silly. – jalf Aug 18 '11 at 07:05
  • @jalf, my opening point about use of alloca was simply that it can be the more efficient way of controlling when stack variables are allocated, as per the opening question and borne out by Matthieu Ms answer. Whether or not it is the appropriate technique is entirely dependent on context. It was your good self that brought the heap into the discussion and the ensuing scope creep ;) – SmacL Aug 18 '11 at 11:01