How can array require only a single heap allocation?... or what does it mean by single heap allocation
First of all, let's clarify what we mean by "heap" vs "stack".
Most programming environments today are stack-based. As you run a program, each time you call a method a new entry is pushed onto a special stack provided for your program. This stack entry (or frame) tells the system where to look for the method's executable code, what arguments were passed, and exactly where to return to in the calling code after the method exits. When a method finishes, it's entry is removed (popped) from the stack, so the program can go back to the previous method. When the stack is empty, the program has finished.1 There is often support for this special stack directly in the CPU.
The memory for the stack is allocated when the program is first launched, which means the stack itself has a fixed (limited) size. This is where "Stack Overflows" come from; get too deep down too many method calls, and the stack will run out of space. Each frame on the stack also has a certain amount of space for local variables for the method, and this is the memory we're talking about when we say value types live on the stack. The local variables stored on the stack do not require new allocations; the memory is already there. Just remember: this only applies in the context of local variables in a method.
The heap, on the other hand, is memory not automatically granted to the program. It is memory the program must request above and beyond it's core allocation. Heap memory has to be managed more carefully (so it doesn't leak — but we have a garbage collector to help with this), but there is (usually) much more of it available. Because it has to be granted by the operating system on request, initial allocations for the heap are also a little slower than memory used from the stack.2 For reference types, you can think of the new
keyword as requesting a new heap memory allocation.
As a broad generalization, we say reference types are allocated on the heap, and value types are allocated on the stack (though there are plenty of exceptions for this3).
Now we understand this much, we can start to look at how .Net handles arrays.
The core array type itself is a reference type. What I mean is, for any given type T
, the T
may (or may not) be a value type, but an array of T
(T[]
) is always a reference type. In the "stack vs heap" context, this means creating a new array is a heap allocation, even if T
is a value type. Arrays in .Net also have a fixed size4.
An additional attribute of value types is they also have a known/fixed size, based on the members. Therefore, an array of value types has a fixed number of elements, each with a known fixed size. That's enough information so allocating a new array of value types will get all the space for the array object and it's elements in single heap allocation. The value of each item (not just a reference) is held right there with the array's core memory. Now we have a bunch of value-type objects, but their memory is on the heap, rather than the stack.
This can be further complicated by a value type with one or more reference type members. In this situation, the space for the value type is allocated as normal, but the the part of the value for the reference members is just a reference. It still requires separate allocations or assignments to populate those reference members.
For an array holding reference types, the initial heap allocation for the array still allocates space for all the elements, but space reserved for each element is only enough for the reference. That is, initially each element in the array is still null
. To populate the array you must still set those references to objects in memory, either by assigning existing objects or allocating new ones.
Finally, just as we were able to use arrays to get a value-type onto the heap, instead of the stack, there are also ways to force reference types to allocate from the stack. However, generally you should not do this unless you really understand the full implications.
1) There are different conventions on exactly when a frame is pushed/popped for each method call, depending on the platform, compiler configuration, and more, so only look at this paragraph for the general idea; the exact specifics will be incorrect in some particulars on any given platform.
2) For future reading, it is also useful to understand how programs handle addressing for heap memory.
3) Eric Lippert has an excellent write-up of this topic.
4) That is, arrays in .Net they are true arrays in the full formal computer science sense, unlike the associative array-like collection types in many other platforms. .Net has these, too, but it calls them what they are: collections rather than arrays.