If we consider that non-generic collections store every member as object, then we can think that generics have also memory advantage. However, I didn't found any information about memory usage difference. Can anyone clarify about the point?
Sure. Let's consider an ArrayList
that contains int
s vs a List<int>
. Let's suppose there are 1000 int
s in each list.
In both, the collection type is a thin wrapper around an array -- hence the name ArrayList
. In the case of ArrayList
, there's an underlying object[]
that contains at least 1000 boxed ints. In the case of List<int>
, there's an underlying int[]
that contains at least 1000 int
s.
Why did I say "at least"? Because both use a double-when-full strategy. If you set the capacity of a collection when you create it then it allocates enough space for that many things. If you don't, then the collection has to guess, and if it guesses wrong and you need more capacity, then it doubles its capacity. So, best case, our collection arrays are exactly the right size. Worst case, they are possibly twice as big as they need to be; there could be room for 2000 objects or 2000 ints in the arrays.
But let's suppose for simplicity that we're lucky and there are about 1000 in each.
To start with, what's the memory burden of just the array? An object[1000]
takes up 4000 bytes on a 32 bit system and 8000 bytes on a 64 bit system, just for the references, which are pointer sized. An int[1000]
takes up 4000 bytes regardless. (There are also a few extra bytes taken up by array bookkeeping, but these costs are small compared to the marginal costs.)
So already we see that the non-generic solution possibly consumes twice as much memory just for the array. What about the contents of the array?
Well, the thing about value types is they are stored right there in their own variable. There is no additional space beyond those 4000 bytes used to store the 1000 integers; they get packed right into the array. So the additional cost is zero for the generic case.
For the object[]
case, each member of the array is a reference, and that reference refers to an object; in this case, a boxed integer. What's the size of a boxed integer?
An unboxed value type doesn't need to store any information about its type, because its type is determined by the type of the storage its in, and that's known to the runtime. A boxed value type needs to somewhere store the type of the thing in the box, and that takes space. It turns out that the bookkeeping overhead for an object in 32 bit .NET is 8 bytes, and 16 on 64 bit systems. That's just the overhead; we of course need 4 bytes for the int. But wait, it gets worse: on 64 bit systems, the box must be aligned to an 8 byte boundary, so we need another 4 bytes of padding on 64 bit systems.
Add it all up: Our int[]
takes about 4KB on both 64 and 32 bit systems. Our object[]
containing 1000 ints takes about 16KB on 32 bit systems, and 32K on 64 bit systems. So the memory efficiency of an int[]
vs an object[]
is either 4 or 8 times worse for the non-generic case.
But wait, it gets even worse. That's just size. What about access time?
To access an integer from an array of integers, the runtime must:
- verify that the array is valid
- verify that the index is valid
- fetch the value from the variable at the given index
To access an integer from an array of boxed integers, the runtime must:
- verify that the array is valid
- verify that the index is valid
- fetch the reference from the variable at the given index
- verify that the reference is not null
- verify that the reference is a boxed integer
- extract the integer from the box
That's a lot more steps, so it takes a lot longer.
BUT WAIT IT GETS WORSE.
Modern processors use caches on the chip itself to avoid going back to main memory. An array of 1000 plain integers is highly likely to end up in the cache so that accesses to the first, second, third, etc, members of the array in quick succession are all pulled from the same cache line; this is insanely fast. But boxed integers can be all over the heap, which increases cache misses, which greatly slows down access even further.
Hopefully that sufficiently clarifies your understanding of the boxing penalty.
What about non-boxed types? Is there a significant difference between an array list of strings, and a List<string>
?
Here the penalty is much, much smaller, since an object[]
and a string[]
have similar performance characteristics and memory layouts. The only additional penalty in this case is (1) not catching your bugs until runtime, (2) making the code harder to read and edit, and (3) the slight penalty of a run-time type check.