8

I have question regarding how memory is managed for strong type Generics

List<int> ints1 = new List<int>();
ints1.Add(1); ints1.Add(2); ints1.Add(3);

int[] ints2 = new int[10]();
ints2.Add(6); ints2.Add(7); ints2.Add(8);

Question 1: I presume as ints1 is initialised with a new keyword (new List<int>()) it becomes a reference type. Where are the values 1,2,3 are stored in memory (are they stored in stack or on heap)?

Question 2: I know of a thing between List<int> and int[], List<int> can scale its size to any size at run time, which is restricted in int[] at compile time. So, if the values 1,2,3 and stored in stack, if a new item is added to List<int> say 4, it wouldn't be continuous to the first 3 right, so how will ints1 know the memory location of 4?

H H
  • 263,252
  • 30
  • 330
  • 514
Gururaj
  • 1,115
  • 2
  • 14
  • 17
  • 7
    `ints2.Add(6)` - does that compile for you? – Oded Dec 30 '11 at 11:09
  • 1
    Interesting fact. List is just an array that gets resized automatically (not every time something is added, but when the capacity is reached). – Ray Dec 30 '11 at 11:12
  • 5
    @Ray more accurately it *encapsulates* an array. It is not true to say that it "is" an array. – Marc Gravell Dec 30 '11 at 11:15
  • Re the stack question: each successive addition loads the list (ldloc), loads the value (ldc_i4), and calls a virtual method (callvirt) that **pops** both values; the position of the stack after each `Add` is the same. – Marc Gravell Dec 30 '11 at 11:21
  • Thank you Marc, can you please elaborate. first of all, are the elements of list 1,2,3 are stored on stack? and how will the list behave on addition of a new element – Gururaj Dec 30 '11 at 12:00
  • 1
    I think @MarcGravell's answer unnecessarily complicates things. He is referring to parameter-passing via the stack when methods are invoked. Gururaj appears to be asking how the elements of the list are stored (in the sense of 'represented') in memory. – Mike Nakis Dec 30 '11 at 12:38
  • @MikeNakis the OP asks about how those values behave on the stack; parameter passin is **the only time** they are on the stack here – Marc Gravell Dec 30 '11 at 12:56
  • @Gururaj no: the **ony** time the values shown are on the stack is when setting up the state to call Add. So *individually*, 1 is on the stack (call add), then 2 is on the stack (call add), then 3 is on the stack (call add). – Marc Gravell Dec 30 '11 at 12:58
  • The statement that `int[]` size is restricted at compile time is false. For example: `int[] xs = new int[new Random().Next(1024)];` this statement creates an array whose size is not known at compile time. The true distinction is that an array's size is fixed at the time of its creation. – phoog Dec 30 '11 at 16:17
  • 2
    I suspect that you believe the lie that "value types are always allocated on the stack". As you must have realized in asking this question, that doesn't make any sense at all. The list is a reference type, and the lifetime of the list might be *longer* than the lifetime of the stack frame that allegedly contains the ints! It is simply totally false that value types are allocated on the stack. **Storage is allocated based on the lifetime**. If the lifetime of the int is long, it's allocated on the long-term storage pool, if it is short, it's allocated on the short-term pool. – Eric Lippert Dec 30 '11 at 16:37

7 Answers7

14

I presume as ints1 is initialised with a new keyword new List<int>() it becomes a reference type

This presumption is incorrect. You can use the "new" keyword on a value type too!

int x = new int();

Using "new" does not make anything a reference type. You can use "new" with reference types or value types. What "new" indicates is that storage is going to be allocated and a constructor is going to be called.

In the case of using "new" on a value type, the allocated storage is temporary storage. A reference to that temporary storage is passed to the constructor, and then the now-initialized result is copied to its final destination, if there is one. ("new" is usually used with an assignment but it need not be.)

In the case of a reference type, storage is allocated twice: long-term storage is allocated for the instance and short-term storage is allocated for the reference to long-term storage of the instance. The reference is passed to the constructor, which initializes the long-term storage. The reference is then copied from short-term storage to its final destination, if there is one.

What makes List<int> a reference type is that List<T> is declared as a class.

Where are the values 1,2,3 are stored in memory (are they stored in stack or on heap)?

We've worked hard to make a memory manager that lets you not care where things are stored. Values are stored in either a short-term memory pool (implemented as the stack or registers) or a long-term memory pool (implemented as a garbage-collected heap). Storage is allocated depending on the known lifetime of the value. If the value is known to be short-lived then its storage is allocated on the short-term pool. If the value is not known to be short-lived then it must be allocated on the long-term pool.

The 1, 2, 3 owned by the list could live forever; we do not know whether that list is going to outlive the current activation frame or not. Therefore the memory to store the 1, 2, 3 is allocated on the long-term pool.

Do not believe the lie that "value types are always allocated on the stack". Obviously that cannot be true because then a class or array containing a number could not survive the current stack frame! Value types are allocated on the pool that makes sense for their known lifetime.

List<int> can scale its size to any size at runtime unlike int[]

Correct. It is educational to see how List<T> does that. It simply allocates an array of T larger than it needs. If it discovers that it guessed too small, it allocates a new, larger array and copies the old array contents to the new one. A List<T> is just a convenient wrapper around a bunch of array copies!

if the values 1,2,3 were stored in stack, and a new item 4 is added to the list, then it wouldn't be continuous to the first three.

Correct. That's one reason why the storage for values 1, 2, 3 are not allocated on the stack. The storage is actually an array allocated on the heap.

so how will the list know the memory location of item 4?

The list allocates an array that is too big. When you add a new item, it sticks it into unused space in the too-big array. When the array runs out of room, it allocates a new array.

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • `List can scale its size to any size at runtime unlike int[]` is correct but NOT regarding `can scale its size to any size at runtime` because of the 2 GB limit (although this might "go away with .NET 4.5 or later) ? – Yahia Dec 30 '11 at 17:22
  • 2
    @Yahia: A list is actually a wrapper around an array, and an array cannot be more than 2GB in size max, correct. If you have several hundred million things in a list then you should be using a database, not an in-memory list. I have no idea if the 2GB object size limitation is going to be changed in the future; that's not my area of expertise and I am bad at predicting the future. – Eric Lippert Dec 30 '11 at 17:25
  • Thanks... I wasn't aiming at "future prediction" - just wanted to know whether it is correct that the 2 GB limit applies to List too which you confirmed :-) – Yahia Dec 30 '11 at 17:44
  • 5
    @Yahia: Right -- but of course that is an implementation detail. A list could be implemented as a tree of arrays, for example. Each individual array could be smaller than 2GB but their total could be larger (on x64 machines of course). – Eric Lippert Dec 30 '11 at 17:54
11

The "new" syntax is used to initialize both value-types and reference-types. The new list is created on the heap; the values are loaded on the stack (i.e. before they are added to the list), but once added, they are on the heap, in the int[] that underpins the list. Arrays are always on the heap.

The fact that they are copied to the array also answers part 2 I believe. The array is over-sized, and reallocated only when full.

Note; List<int> doesn't "become" a reference-type; it is always a reference-type.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
6

Memory management for generics (Generic collections) is exactly the same as for non-generic types.

Your ints1 list uses an array under the covers. So it is the same as for ints2 (when it has been corrected). In both cases a block of memory on the Heap is holding the int values.

The List<> class consists of an array, a int Count and an int Capacity property. When you Add() an element Count is incremented, when it passes Capacity a new array is allocated and the contents are copied.

H H
  • 263,252
  • 30
  • 330
  • 514
  • Hank, although it may be technically correct to say that memory management for generic collections is the same as for non-generic collections, in practice different things happen for different types, and generics have a clear advantage. This is because generics are smart enough not to box value types. See the comments that follow my answer. – Mike Nakis Dec 30 '11 at 13:07
  • @MikeNakis - that is the standard response when comparing generics and the older (ArrayList) collections. I think the OP is asking about something else. – H H Dec 30 '11 at 16:41
2

Question 1: http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx says:

The List class is the generic equivalent of the ArrayList class. It implements the IList generic interface using an array whose size is dynamically increased as required.

This looks like a simple array, that is just reallocated if it overflows. AFAIKR the size is doubled on every reallocation - I researched that once, but can't remember what for.

The array is allocated on the managed heap, just as it would if you just declared it.

Richard Ev
  • 52,939
  • 59
  • 191
  • 278
Eugen Rieck
  • 64,175
  • 10
  • 70
  • 92
2
  1. List is a reference type no matter how you see it. All these types are allocated on the heap. I do not know whether the C# compiler is clever enough yet to figure out that an object which is not used outside of a method can be allocated on the stack, (Eric Lippert might be able to tell us,) but even if it does, that's something that you, as a programmer, do not need to worry about. It will just be an optimization that the compiler will do for you, without you ever noticing.

  2. An array of int is also a reference type and it is also allocated on the heap, it is just as simple as that. There is no point in wondering about some hypothetical fragmentation of arrays in the stack, because they are simply not allocated in the stack.

Mike Nakis
  • 56,297
  • 11
  • 110
  • 142
  • Thank you Mike, could you please correct my understanding stated below int[] obj = { 10, 20, 30 }; int[] is a class, a reference type, so it consumes 8byte header on heap. However, the its elements are int type which take 4 byte memory on stack and are contigeous. So in nutshell, 1. Obj takes 4 bytes of memory on stack to hold the address of reference type class int[] which is on heap 2. Reference type class int[] holds 8bytes of memory as reference object header on heap 3. Elements 10, 20, 30 consumes 4 bytes memory each on stack which are contiguous – Gururaj Dec 30 '11 at 11:50
  • In contrast ArrayList obj = new ArrayList(); Obj.Add(10); Obj.Add(20); Obj.Add(30); 1. Obj consumes 4 bytes of memory on stack to hold the address of reference type ArrayList() which is on heap 2. ArrayList()  which is a reference type class which consumes 8byte object header on heap 3. each of the elements of ArrayList() 10, 20, 30 are objects on heap which consumes 8byte each to hold the object header. 4. 10, 20, 30 are different objects and might not be contigeous on Heap 5. Each of the elements consumse 8Byte object header is because, they all get boxed internally. – Gururaj Dec 30 '11 at 11:53
  • In Contrast List obj = new List(); obj.Add(10); obj.Add(20); obj.Add(30); 1. Obj takes 4 bytes of memory on stack to hold the address of reference type class List which is on heap 2. Reference type class List holds 8bytes of memory as reference object header on heap 3. Elements 10, 20, 30 consumes 4 bytes memory each on stack which are contiguous 4. Each of these items doesn’t consume 8byte object header is because they use the Array[] class which doesn’t do boxing of object as its not required and all elements are stored on stack contiguously. – Gururaj Dec 30 '11 at 11:53
  • Regarding int[] that's all correct, though the object header is most likely considerably larger than 8 bytes. I do not know exactly how large it is, and it does not really matter, since it may change from version to version of the CLR. – Mike Nakis Dec 30 '11 at 12:00
  • Regarding ArrayList: Obj consumes 4 bytes of memory on stack to hold the address of reference type ArrayList which is on the heap. ArrayList is a reference type class which consumes an unknown number of bytes and contains a reference to an array of boxed ints. This array consumes an unknown number of bytes for header, and 4 bytes per reference to each boxed int in a contiguous block. Each boxed int consumes an unknown number of bytes. – Mike Nakis Dec 30 '11 at 12:03
  • Regarding List: Obj takes 4 bytes of memory on the stack to hold the address of reference type class List which is on the heap. Reference type class List takes up an unknown number of bytes, and contains a reference to an int[]. This int[] is exactly as the int[] you described earlier. – Mike Nakis Dec 30 '11 at 12:06
  • That's the beauty of C# generics: they provide real benefits. Note: when I said earlier that "each boxed int consumes an unknown number of bytes", rest assured that this number is greater than 4, greater even than 8, more likely close to 32. – Mike Nakis Dec 30 '11 at 12:08
  • Yes true, thanks. would you like to answer my initial question. how will the list behave when a new item is added to it? would it have created a some space initial space on stack which it plans to fill in for new items till it is full? once full it might relocate the entire contents to a new stack location(would this be a performance hit) appreciate your thoughts!!! – Gururaj Dec 30 '11 at 12:08
  • @Henk Holterman has already answered this. But please, forget the stack: the stack only holds a 4-byte reference to the List. Everything else from that point on is happening on the heap. I am sorry if I confused you by saying "this int[] is exactly as the int[] you described earlier"; There is just one little difference here: the reference to the int[] is of course contained within the List object, it is not on the stack. – Mike Nakis Dec 30 '11 at 12:14
  • 1
    Instances of classes, arrays and delegates are never allocated on the stack even if we could prove statically that no reference ever escapes the method. The benefit of stack allocation does not pay for the costs of escape analysis; in that scenario, the memory is going to be reclaimed by a gen zero collection soon anyway, which is pretty cheap. – Eric Lippert Dec 30 '11 at 16:34
  • @EricLippert I understand, except for one thing: aren't you weighing runtime benefits against compile-time cost here? (stack allocation vs escape analysis) – Mike Nakis Dec 30 '11 at 16:40
  • 2
    @MikeNakis: Compile-time costs *are* runtime costs. The jit compiler runs at runtime, and it is the jitter that owns the relationship with the memory manager regarding what gets stored where. Now, you might say, well, the C# compiler can perform the escape analysis and the jitter can consume it. So now the cost is not borne by the jitter, it is borne by the *verifier*; the runtime verifier has to verify that the C# compiler isn't lying to the runtime. None of this complexity is worth the cost; the gen zero gc is fast. – Eric Lippert Dec 30 '11 at 16:54
  • @EricLippert OK, you covered me completely. Thank you very much for taking the time to clear these things up for us. – Mike Nakis Dec 30 '11 at 16:55
  • @Eric There are times when I would cheerfully trade a significant increase in JIT cost for some whole program escape analysis. I'd be happy to do it as an .exe.config setting too. I get I'm in a (very) small minority here though. – ShuggyCoUk Dec 30 '11 at 17:31
1
List<int> ints1 = new List<int>();

I presume as ints1 is initialised with a new keyword (new List()) it becomes a reference type.

Let's clear up on some terminology first:

  • ints1 is not a type of any sort, but a variable, so it cannot "become a reference type", ever.
  • Variables have a static ("compile-time") type. ints1, for example, was declared to be of type List<int>. The static type of a variable never changes.
  • The type of the value or object that a variable refers to is called its dynamic ("run-time") type. After the above assignment, the dynamic type of ints1 is List<int>. The dynamic type of a variable may change whenever a new value or object is assigned to it.

In the case of ints1, both types happen to be equal, but this isn't always so:

ICollection<int> ints3 = new List<int>();

Here, the static type is ICollection<int> (always), and the dynamic type List<int> (after the assignment).


Answer to Question 1:

ints1.Add(1); ints1.Add(2); ints1.Add(3);

Where are the values 1,2,3 are stored in memory (are they stored in stack or on heap)?

The official answer is that this should be considered an implementation detail of List<T> and should not matter to you. All you need to know about is the operations you can perform on a List<T> (such as Add, Clear, etc.) and their characteristics (such as pre-/post-conditions, performance, etc.).

If you still want to know how this type works internally, let's start with this hint:

The List<T> class […] implements the IList<T> generic interface using an array whose size is dynamically increased as required.MSDN reference page for List<T>, section Remarks

Meaning, you can imagine a List<T> to be an array of type T[] that grows its capacity when needed. In the case of ints1, think of an int[]. Because int is a value type, the concrete values in the list (1, 2, 3) will be stored in-place. If it were a reference type, the values would each be stored in a separate location and the array would only contain references.

Note that I haven't mentioned the terms "stack", nor "heap" in the above paragraph. This is because these concepts are another implementation detail, namely one of the .NET platform, that you're not supposed to care too much about. (See Eric Lippert's blog article series, "The stack is an implementation detail", and e.g. my answer to a previous question, "Does new always allocate on the heap in C++ / C# / Java?")


Answer to Question 2:

List can scale its size to any size at run time, which is restricted in int[] at compile time. So, if the values 1,2,3 and stored in stack, if a new item is added to List say 4, it wouldn't be continuous to the first 3 right, so how will ints1 know the memory location of 4?

Again, you're not required to think about that. What matters about List<int> is whether its particular characteristics and operations that it offers suit your purposes. But I wouldn't really have answered your question if I left it at that, so let's briefly think about this anyway:

When you talk about the "stack", you probably mean an executing thread's call stack. AFAIK, virtual machines and programming languages that make use of such a data structure mostly use it for passing arguments to functions/methods, and perhaps also for storing values that stay local to a function. This is not the same as saying that the call stack also gets used for local variables, because a local variable may contain an object that is returned to the calling function, either via the return value, or via a ref/out parameter, etc.

To me, it seems unlikely that a List<int> object would ever end up on a call stack under normal circumstances (I'm not considering stackalloc or anything else related to unsafe contexts btw.). Remember that List<T> is a reference type: While it's possible, and perhaps quite likely, that the reference to the actual object ends up on the call stack, most likely the object's data itself won't.

A (perhaps naïve) implementation of IList<T> that uses fixed-size arrays for item storage, faced with the need to increase its capacity, could dynamically allocate a new, larger array, then copy the contents of the current array over to the new array, then abandon the old array in favor of the new one. That way, all items remain together in one place.

Community
  • 1
  • 1
stakx - no longer contributing
  • 83,039
  • 20
  • 168
  • 268
  • +1 for saving me from having to write an answer linking to "the stack is an implementation detail". But a small quibble on the terminology section: variables can't hold objects; they hold references to objects. – phoog Dec 30 '11 at 16:02
  • Also, your "perhaps naïve" implementation of `IList` describes the actual behavior of `System.Collections.Generic.List` very well. – phoog Dec 30 '11 at 16:12
  • @phoog, thanks for your correction, I've edited my answer. (Concerning the "perhaps naïve" judgment: I am cautious because my understanding of data structures comes mostly from practical experience, not from an in-depth, formal education. So, while *I* might implement `IList` this way, someone else might know of a more efficient method that still satisfies O(1) random access characteristics that people would expect from an implementation of `IList`.) – stakx - no longer contributing Dec 30 '11 at 22:16
  • 1
    I'm not questioning your judgment about the naïveté of the implementation. It's more or less accurate, and I've often wished for more "sophisticated" implementations as judged by one parameter or another. I just thought you might like to know that your speculation about the implementation was right on the money. – phoog Dec 30 '11 at 22:21
0

Question 1

Lists in C# internally contain arrays. List refers to a location on the heap, and in that location is an array storing all the values. So the values are stored on the heap. Same goes for arrays if it's part of a class.

Question 2

The stack is continuous, so pushing another int on the stack will mean that it's memory address is the location of the previous int + 4. The way Lists work when you adding items is that they create an array larger than what you need. When you reach the length of the array, there's an algorithm that creates a larger array and copies the current values over.

Another thing that may interest you are linked lists. Linked lists don't work with arrays internally, instead they work with nodes. Each node contains data and the location of the next node in the list. A doubly linked list contains nodes with all that and the location of the previous node in the list.

Robert Rouhani
  • 14,512
  • 6
  • 44
  • 59