46

I know C standards preceding C99 (as well as C++) says that the size of an array on stack must be known at compile time. But why is that? The array on stack is allocated at run-time. So why does the size matter in compile time? Hope someone explain to me what a compiler will do with size at compile time. Thanks.

The example of such an array is:

void func()
{
    /*Here "array" is a local variable on stack, its space is allocated
     *at run-time. Why does the compiler need know its size at compile-time?
     */
   int array[10]; 
}
Eric Z
  • 14,327
  • 7
  • 45
  • 69

6 Answers6

92

To understand why variably-sized arrays are more complicated to implement, you need to know a little about how automatic storage duration ("local") variables are usually implemented.

Local variables tend to be stored on the runtime stack. The stack is basically a large array of memory, which is sequentially allocated to local variables and with a single index pointing to the current "high water mark". This index is the stack pointer.

When a function is entered, the stack pointer is moved in one direction to allocate memory on the stack for local variables; when the function exits, the stack pointer is moved back in the other direction, to deallocate them.

This means that the actual location of local variables in memory is defined only with reference to the value of the stack pointer at function entry1. The code in a function must access local variables via an offset from the stack pointer. The exact offsets to be used depend upon the size of the local variables.

Now, when all the local variables have a size that is fixed at compile-time, these offsets from the stack pointer are also fixed - so they can be coded directly into the instructions that the compiler emits. For example, in this function:

void foo(void)
{
    int a;
    char b[10];
    int c;

a might be accessed as STACK_POINTER + 0, b might be accessed as STACK_POINTER + 4, and c might be accessed as STACK_POINTER + 14.

However, when you introduce a variably-sized array, these offsets can no longer be computed at compile-time; some of them will vary depending upon the size that the array has on this invocation of the function. This makes things significantly more complicated for compiler writers, because they must now write code that accesses STACK_POINTER + N - and since N itself varies, it must also be stored somewhere. Often this means doing two accesses - one to STACK_POINTER + <constant> to load N, then another to load or store the actual local variable of interest.


1. In fact, "the value of the stack pointer at function entry" is such a useful value to have around, that it has a name of its own - the frame pointer - and many CPUs provide a separate register dedicated to storing the frame pointer. In practice, it is usually the frame pointer from which the location of local variables is calculated, rather than the stack pointer itself.

caf
  • 233,326
  • 40
  • 323
  • 462
  • caf: that's a good explanation +1. May be you can also introduce the concept of frame pointer – Chubsdad Dec 03 '10 at 02:35
  • @caf: sorry for nitpicking, but the terms is 'VLA' or variable length array rather than variable size array AFAIK – Chubsdad Dec 03 '10 at 03:01
  • @Chubsdad: VLA is indeed the term used in C, but I believe the concept/explanation is applicable to other compiled languages too, and the general term I've used seems clear enough. – caf Dec 03 '10 at 03:48
  • @caf: one note :) Things are simple as long as there is a single VLA in the function --> just put its storage at the end (after all there is no guarantee in the standard that variables will be layed out in the order they are declared). Unfortunately you still have to account for the multiple VLAs case... – Matthieu M. Dec 03 '10 at 08:08
  • @Mattheiu M: That is true, at least in the case with an explicit frame pointer. – caf Dec 03 '10 at 09:17
  • @MatthieuM. That function could also call no other functions, or have any variables created conditionally in it (ex `if (a==b){int c=1;}`). @caf Just wanted to add Frame Pointer is really just good for hand coding assembly or debugging. It is not really as described in your footnote (atleast on the common x86). – Joe McGrath Nov 25 '11 at 06:46
  • @JoeMcGrath: no compiler I know of updates the stack pointer on block entry. Sufficient space is reserved upfront on function entry. This means that if you have an infrequent huge stack requirement you are better of putting the body of the `if` in another function in case of stack pressure. But in truth, it's even worse, because no compiler I know performs Stack Coloring, meaning that they reserve the space as if all variables would be available at once, even when some are naturally mutually exclusive. I know that LLVM is thinking about Stack Coloring, but it requires liveness info and is hard – Matthieu M. Nov 25 '11 at 10:42
  • @MatthieuM. Very well, having VLA at end of stack would still cause issues if another function were to be called. I confirmed what you say as true by looking at gcc generated assembly. I am surprised because when I write my own assembly i reuse parts of the stack and conditionally push to it. I'm sure the compiler is still faster than me though. – Joe McGrath Nov 25 '11 at 15:49
  • @JoeMcGrath: Wasting stack space is not generally an issue. People having strong constraints on stack will avoid storing much on it anwyay. Therefore by making the modifications as infrequent as possible, you save some instructions. – Matthieu M. Nov 25 '11 at 16:45
  • @MatthieuM.: It occurs to me that minimising stack space will also minimise the function's cache footprint, which could certainly impact performance. – caf Nov 17 '12 at 09:35
  • @caf: I don't think so; the cache evictions occur depending on the amount of memory walked and where it maps in the cache. The case could be made that actually as the L1 cache maps up to 32k of memory, having those 32k contiguous helps preventing accidental overlap. – Matthieu M. Nov 17 '12 at 14:25
  • @MatthieuM.: The L1 cache deals in units of cachelines (typically 64 bytes). If re-using dead stack slots for locals with non-overlapping means that an additional cacheline of stack memory doesn't need to be pulled into L1, that will leave more of L1 for the remainder of the processes working set. – caf Nov 17 '12 at 20:15
10

It is not an extremely complicated thing to support, so the reason C89 doesn't allow this is not because it was impossible back then.

There are however two important reasons why it is not in C89:

  1. The runtime code will get less efficient if the array size is not known at compile-time.
  2. Supporting this makes life harder for compiler writers.

Historically, it has been very important that C compilers should be (relatively) easy to write. Also, it should be possible to make compilers simple and small enough to run on modest computer systems (by 80s standards). Another important feature of C is that the generated code should consistently be extremely efficient, without any surprises,

I think it is unfortunate that these values no longer hold for C99.

Johan Kotlinski
  • 25,185
  • 9
  • 78
  • 101
  • 2
    Well, the main reason it wasn't in C89 is because C89 was written on the premise of *"standardise the existing practice, don't invent"*, and it wasn't in existing C implementations. – caf Dec 03 '10 at 02:08
  • I would say a good part of C99 was standardizing existing practice too - VLA, compound literals, `long long`, named struct/union element initializers, etc. were all in gcc well before C99 if I remember right. Surely there's ugly stuff like `tgmath.h`, but I'd say the ugly inventions are a fairly small portion of what's new. The gcc team deserves at least as much credit for bloating the C standard as the committee does. :-) – R.. GitHub STOP HELPING ICE Dec 03 '10 at 02:34
6

The compiler has to generate the code to create the space for the frame on the stack to hold the array and other local local variables. For this it needs the size of the array.

Denis Hennessy
  • 7,243
  • 4
  • 25
  • 31
  • 1
    Obviously it does not need it in compile-time, since this was amended in C99. – Johan Kotlinski Dec 03 '10 at 01:09
  • Dealing with non-constant stack frame layout requires a good bit more complexity, and it's hard/impossible to do without a frame pointer. – R.. GitHub STOP HELPING ICE Dec 03 '10 at 01:29
  • @R.: It's definitely not *that* complicated, but yes, complexity and runtime performance decrease is the problem. – Johan Kotlinski Dec 03 '10 at 01:37
  • 7
    A lot of C's design is based on what was easy (for the compiler writers) to implement. That's why it's one of the lowest-level "high-level" languages around. Something didn't need to be impossible to be missing from early versions of C, merely non-trivial. – Laurence Gonsalves Dec 03 '10 at 01:38
  • @R..: It's a bit more painful, but whether or not you keep a frame pointer (at least in the usual sense) has very little effect on the complexity. – Stephen Canon Dec 03 '10 at 02:17
  • @Stephen: Well the address of all the local variables relative to the stack pointer will vary depending on the size of the vla, so you need to keep the offset/base address for the stack variables somewhere. I call that a frame pointer... – R.. GitHub STOP HELPING ICE Dec 03 '10 at 02:23
  • In a stack frame, the variable array would have to come after all the standard auto variables (in order to resolve the locations relative to base). We also have to clean up the stack. If we have a frame pointer it's easy. That's how alloca() works. But without a frame pointer the runtime would have to know how much to pop for each array. – seand Dec 03 '10 at 03:12
  • @seand: What happens if you have two or more VLAs? Now the addresses of all but one depend on the lengths of the others. – caf Dec 03 '10 at 03:50
  • @caf good point. The base addresses would have to be computed at runtime. – seand Dec 03 '10 at 03:56
  • I have never looked at the generated code in detail, but I always assumed the implementation would store non-vla local variables and *pointers to* the start of vlas at the top of the stack (assuming a downward-growing stack) where they're all at fixed offsets from the traditional frame pointer, and vlas/`alloca` storage below that. – R.. GitHub STOP HELPING ICE Dec 03 '10 at 15:14
4

Depends on how you allocate the array.

If you create it as a local variable, and specify a length, then it matters because the compiler needs to know how much space to allocate on the stack for the elements of the array. If you don't specify a size of the array, then it doesn't know how much space to set aside for the array elements.

If you create just a pointer to an array, then all you need to do is allocate the space for the pointer itself, and then you can dynamically create array elements during run time. But in this form of array creation, you're allocating space for the array elements in the heap, not on the stack.

Marvo
  • 17,845
  • 8
  • 50
  • 74
  • I agree that the "heap" implementation is much easier, however this means spurious heap allocation (which costs) and cache-locality suffers. – Matthieu M. Dec 03 '10 at 08:11
  • Oh, I wasn't advocating one or the other. Your requirements will dictate that. It's just another approach. – Marvo Dec 03 '10 at 19:29
4

In C++ this becomes even more difficult to implement, because variables stored on the stack must have their destructors called in the event of an exception, or upon returning from a given function or scope. Keeping track of the exact number/size of variables to be destroyed adds additional overhead and complexity. While in C it's possible to use something like a frame pointer to make freeing of the VLA implicit, in C++ that doesn't help you, because those destructors need to be called.

Also, VLAs can cause Denial of Service security vulnerabilities. If the user is able to supply any value which is eventually used as the size for a VLA, then they could use a sufficiently large value to cause a stack overflow (and therefore failure) in your process.

Finally, C++ already has a safe and effective variable length array (std::vector<t>), so there's little reason to implement this feature for C++ code.

Billy ONeal
  • 104,103
  • 58
  • 317
  • 552
  • 1
    vector isn't *quite* as nice as a VLA because it requires heap memory allocation which can be slow, especially in threaded environments. – Zan Lynx Dec 03 '10 at 23:45
  • 2
    @Zan: VLA isn't quite as nice as vector because it can blow the stack. Different strokes for different folks I guess. I'd rather have slow than process termination, TYVM :) (But +1 to comment) – Billy ONeal Dec 03 '10 at 23:59
3

Let's say you create a variable sized array on the stack. The size of the stack frame needed by the function won't be known at compile time. So, C assumed that some run time environments require this to be known up front. Hence the limitation. C goes back to the early 1970's. Many languages at the time had a "static" look and feel (like Fortran)

seand
  • 5,168
  • 1
  • 24
  • 37