Memory usage difference between Generic and Non-generic collections in .NET

Question

I read about collections in .NET nowadays. As known, there is some advantages using generic collections over non-generic: they are type-safety and there is no casting, no boxing/unboxing. That's why generic collections have a performance advantage.

If we consider that non-generic collections store every member as object, then we can think that generics have also memory advantage. However, I didn't found any information about memory usage difference.

Can anyone clarify about the point?

*In general*, write the clearest code that does the job you need. Then, *if* you have memory concerns, *profile* the code to find out *where* you're using memory, which types, etc, and *then* explore alternatives. Don't try to learn billions of "always use X/never use Y" rules. — Damien_The_Unbeliever, Nov 08 '17 at 14:49
[This](https://stackoverflow.com/questions/8678795/generics-memory-management) could be quite interesting for you. — L. Guthardt, Nov 08 '17 at 14:53
An `object` will always be bigger than, say, an `int`. [Related](https://stackoverflow.com/q/10655829/4137916). — Jeroen Mostert, Nov 08 '17 at 14:54
A `List` takes 1 byte per element, an ArrayList takes 12 bytes in 32-bit mode and 24 bytes in 64-bit mode to store a byte. It is not sensible to consider a 2400% storage overhead as ever useful, simply forget that .NET 1.0 ever existed. — Hans Passant, Nov 08 '17 at 15:15
These difference are officially documented in the Performance Considerations (under Remarks) section of [MSDN: List(T) Class](https://msdn.microsoft.com/en-us/library/6sh2ey19.aspx) — Brian, Nov 08 '17 at 21:29

score 20 · Accepted Answer · answered Nov 08 '17 at 15:12

If we consider that non-generic collections store every member as object, then we can think that generics have also memory advantage. However, I didn't found any information about memory usage difference. Can anyone clarify about the point?

Sure. Let's consider an ArrayList that contains ints vs a List<int>. Let's suppose there are 1000 ints in each list.

In both, the collection type is a thin wrapper around an array -- hence the name ArrayList. In the case of ArrayList, there's an underlying object[] that contains at least 1000 boxed ints. In the case of List<int>, there's an underlying int[] that contains at least 1000 ints.

Why did I say "at least"? Because both use a double-when-full strategy. If you set the capacity of a collection when you create it then it allocates enough space for that many things. If you don't, then the collection has to guess, and if it guesses wrong and you need more capacity, then it doubles its capacity. So, best case, our collection arrays are exactly the right size. Worst case, they are possibly twice as big as they need to be; there could be room for 2000 objects or 2000 ints in the arrays.

But let's suppose for simplicity that we're lucky and there are about 1000 in each.

To start with, what's the memory burden of just the array? An object[1000] takes up 4000 bytes on a 32 bit system and 8000 bytes on a 64 bit system, just for the references, which are pointer sized. An int[1000] takes up 4000 bytes regardless. (There are also a few extra bytes taken up by array bookkeeping, but these costs are small compared to the marginal costs.)

So already we see that the non-generic solution possibly consumes twice as much memory just for the array. What about the contents of the array?

Well, the thing about value types is they are stored right there in their own variable. There is no additional space beyond those 4000 bytes used to store the 1000 integers; they get packed right into the array. So the additional cost is zero for the generic case.

For the object[] case, each member of the array is a reference, and that reference refers to an object; in this case, a boxed integer. What's the size of a boxed integer?

An unboxed value type doesn't need to store any information about its type, because its type is determined by the type of the storage its in, and that's known to the runtime. A boxed value type needs to somewhere store the type of the thing in the box, and that takes space. It turns out that the bookkeeping overhead for an object in 32 bit .NET is 8 bytes, and 16 on 64 bit systems. That's just the overhead; we of course need 4 bytes for the int. But wait, it gets worse: on 64 bit systems, the box must be aligned to an 8 byte boundary, so we need another 4 bytes of padding on 64 bit systems.

Add it all up: Our int[] takes about 4KB on both 64 and 32 bit systems. Our object[] containing 1000 ints takes about 16KB on 32 bit systems, and 32K on 64 bit systems. So the memory efficiency of an int[] vs an object[] is either 4 or 8 times worse for the non-generic case.

But wait, it gets even worse. That's just size. What about access time?

To access an integer from an array of integers, the runtime must:

verify that the array is valid
verify that the index is valid
fetch the value from the variable at the given index

To access an integer from an array of boxed integers, the runtime must:

verify that the array is valid
verify that the index is valid
fetch the reference from the variable at the given index
verify that the reference is not null
verify that the reference is a boxed integer
extract the integer from the box

That's a lot more steps, so it takes a lot longer.

BUT WAIT IT GETS WORSE.

Modern processors use caches on the chip itself to avoid going back to main memory. An array of 1000 plain integers is highly likely to end up in the cache so that accesses to the first, second, third, etc, members of the array in quick succession are all pulled from the same cache line; this is insanely fast. But boxed integers can be all over the heap, which increases cache misses, which greatly slows down access even further.

Hopefully that sufficiently clarifies your understanding of the boxing penalty.

What about non-boxed types? Is there a significant difference between an array list of strings, and a List<string>?

Here the penalty is much, much smaller, since an object[] and a string[] have similar performance characteristics and memory layouts. The only additional penalty in this case is (1) not catching your bugs until runtime, (2) making the code harder to read and edit, and (3) the slight penalty of a run-time type check.

Excellent explanation. Thank You! – Shyju Nov 12 '17 at 00:51 — Shyju, Nov 12 '17 at 00:51

MakePeaceGreatAgain · Answer 2 · 2017-11-08T15:24:58.093

then we can think that generics have also memory advantage

This assumption is false, it only applies on value-types. So considder this:

new ArrayList { 1, 2, 3 };

This will implicetly cast every integer into object (known as boxing) in order to store it into your ArrayList. This will cause your memory-overhead here, because an object surely is bigger than a simple int.

For reference-types there´s no difference however as there´s no need for boxing.

Using the one or the other shouldn´t be driven bei neither any performance- nor memory-issues. However you should ask yourself what you want to do with the results. In particular if you know the type(s) stored in your collection at compile-time, there´s no reason to not put this information into the compile-process by using the right generic type-argument.

Anyway you should allways use generic collections instead of non-generic ones because of the mentioned type-safety.

EDIT: Your actual question if using a non-generic collection or a generic version is quite pointless: allways use the generic one. But not because of its memory-usage. See this:

ArrayList a = new ArrayList { 1, 2, 3};

vs.

List<object> a = new List<object> { 1, 2, 3 };

Both lists will consume same amount of memory, although the second one is generic. That´s because they both box your integers into object. So the answer to the question has nothing to do with memory.

On te other saying for reference-types there´s no memory-differencee at all:

ArrayList a = new ArrayList { myInstance, anotherInstance }

vs.

List<MyClass> a = new List<MyClass> { myInstance, anotherInstance }

will produce the same memory-outcome. However the second one is far easier to maintain as you can work with the instances directly without casting them.

Really? I'd have though a `List` will have a `byte[]` storing its values internally vs a `List` having an `object[]` which, necessarily contains the 4 or 8 byte references, *plus* the boxing overheads. — Damien_The_Unbeliever, Nov 08 '17 at 14:46

score 0 · Answer 3 · answered Nov 08 '17 at 14:59

Lets assume we have this statement :

int valueType = 1;

so now we have a value on the stack as follows :

stack

i = 1

Now consider we do this now :

object boxingObject = valueType;

Now we have two values stored in the memory, the reference for valueType in the stack and the value 1 in the heap:

stack

boxingObject

heap

So in case of boxing a value type there will be extra usage for memory as Microsoft docs states :

Boxing a value type allocates an object instance on the heap and copies the value into the new object.

See this link for full information.

Memory usage difference between Generic and Non-generic collections in .NET

3 Answers3

stack

stack

heap

Linked