41

I have been looking at .NET libraries using ILSpy and have come across List<T> class definition in System.Collections.Generic namespace. I see that the class uses methods like this one:

// System.Collections.Generic.List<T>
/// <summary>Removes all elements from the <see cref="T:System.Collections.Generic.List`1" />.</summary>
public void Clear()
{
    if (this._size > 0)
    {
        Array.Clear(this._items, 0, this._size);
        this._size = 0;
    }
    this._version++;
}

So, the Clear() method of the List<T> class actually uses Array.Clear method. I have seen many other List<T> methods that use Array stuff in the body.

Does this mean that List<T> is actually an undercover Array or List only uses some part of Array methods?

I know lists are type safe and don't require boxing/unboxing but this has confused me a bit.

Smi
  • 13,850
  • 9
  • 56
  • 64
Antun Tun
  • 1,507
  • 5
  • 20
  • 38
  • 22
    Where you expecting it to be a linked list or something? There's no magic in computers. A List is simply an array that automatically grows as needed, rather than needing explicit reallocation. – siride May 05 '13 at 17:49
  • 12
    There is *some* magic. – Theodoros Chatzigiannakis May 05 '13 at 18:13
  • 10
    If you already had ILSpy open, and you were already looking at the definition of List, why didn't you just look at the definition of "this._items" to answer your own question? – BTownTKD May 05 '13 at 20:07
  • Also note that the F# language uses the name [`ResizeArray`](http://msdn.microsoft.com/en-us/library/ee353447.aspx) for `List`. – Jeppe Stig Nielsen May 05 '13 at 20:19
  • I voted to close this because the question "Is List really an undercover Array?" doesn't really mean anything to programmers. Either it is an extension of Array or it encapsulates it, you can find out easily with your ILSpy tool. Try rewording the question if you would like it re-opened. – Jesse Webb May 06 '13 at 05:07
  • 2
    I'm favoriting this, not because I think it's a good question (you're halfway to the answer yourself), but because some of the best .NET people on SO have answered. – Jonathon Reinhart May 06 '13 at 07:08
  • 1
    What is the problem asking this? I've never used ILSpy, and I didn't know anything about how List is undercover. So this question helped me improve my knowledge, and I'm sure it helped and will help others too. – marcos.borunda May 07 '13 at 15:47
  • @marcos.borunda - Finally some guy that is not showing his frustration, thank you – Antun Tun Jul 15 '13 at 12:55

4 Answers4

46

The list class is not itself an array. In other words, it does not derive from an array. Instead it encapsulates an array that is used by the implementation to hold the list's member elements.

Since List<T> offers random access to its elements, and those elements are indexed 0..Count-1, using an array to store the elements is the obvious implementation.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • 1
    Well, I know List is not Array because we have two different classes but if it uses arrays under the hood that means it IS some kind of upgrade to already exsisting concept of array or I am wrong? – Antun Tun May 05 '13 at 18:37
  • 23
    It's not an "upgraded array". It is a list, implementing IList, and supporting List-style operations, such as enumeration, insertion, appending and so forth. The *internal implementation* of it is as an array, but that's just that - an internal implementation detail. If it was implemented as a linked list, it would have exposed the same interface and the same functionality, just different performance characteristics. – Avner Shahar-Kashtan May 05 '13 at 18:40
  • @AvnerShahar-Kashtan - if you buy a Mercedes motor and put it instead of a FIAT one inside a FIAT car, do you drive FIAT od Mercedes? I mean, I get it that it is not an Array, but functionality is implemented over Array methods. – Antun Tun May 05 '13 at 18:45
  • 6
    Oh, it's important to know a List's performance characteristics so you can use it properly. But *it is not an array*. It is not an upgraded array. It is a `List`, whose implementation is based on an array. – Avner Shahar-Kashtan May 05 '13 at 18:46
  • @AvnerShahar-Kashtan - OK, you win – Antun Tun May 05 '13 at 18:47
  • 1
    @AntunTun `List` can don anything that an array can, and do more. It has the same performance characteristics for access. But `List` allows insertion, appending and so on. Very broadly you can think of `List` as a more convenient array. – David Heffernan May 05 '13 at 18:50
  • @AntunTun: The confusion might have been caused by the fact that the phrase "*is a*" has a well-established meaning in object-oriented languages. `a` *is a* `B` if, and only if, `A`, the class of `a`, equals `B` or is a subclass of `B` (for example: `myDog` (of class `Dog`) *is an* `Animal`). – Heinzi May 06 '13 at 06:19
  • @Heinzi - You are wright, a lot of people got distraction from "is-a". – Antun Tun Jul 15 '13 at 12:58
33

This tends to surprise C++ programmers that know std::list. A linked list, covered in .NET as well with the LinkedList class. And has the same perf characteristics, O(1) for inserts and deletes.

You should however in general avoid it. Linked lists do not perform well on modern processors. Which greatly depend on the cpu caches to get reasonable performance with memory that's many times slower than the execution core. A simple array is by far the data structure that takes most advantage of the cache. Accessing an element gives very high odds that subsequent elements are present in the cache as well. That is not the case for a linked list, elements tend to be scattered throughout the address space, make a cache miss likely. They can be very expensive, as much as 200 cycles with the cpu doing nothing but waiting on the memory sub-system to supply the data.

But do keep the perf characteristics in mind, adding or removing an element that is not at the end of the List costs O(n), just like an array. And a large List can generate a lot of garbage as the array needs to be expanded, setting the Capacity property up front can help a lot to avoid that. More about that in this answer. And otherwise the exact same concerns for std::vector<>.

Community
  • 1
  • 1
Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
26

Yes, List<T> uses an array internally to store the items, although in most cases the array is actually larger than the number of elements in the collection -- it has some extra "padding" at the end so that you can add new items without it having to reallocate memory every time. It keeps track of the actual size of the collection with a separate field (you can see this._size in your generated code). When you add more elements than the current array has room for, it will automatically allocate a new larger array -- twice as big, I think -- and copy over all the existing elements.

If you're concerned about a List<T> using more memory than necessary, you can set the size of the array explicitly with the constructor override that accepts a capacity parameter, if you know the size in advance, or call the TrimExcess() method to make sure the array is (close to) to actual size of the collection.

Jeremy Todd
  • 3,261
  • 1
  • 18
  • 17
  • 3
    It is worth adding that the standard means of _growing_ an array dynamically, of doubling its size on each increase, remains _linear_ in the size of the array, and thus continues to provide O(1) _average_ insertion time. – Pieter Geerkens May 05 '13 at 18:09
  • 4
    @PieterGeerkens Not *average insertion* time. That would mean that in most cases, one can insert at any index in constant time. Instead, an over-allocating dynamic array offers *amortized worst-case* appending in Omega(1) time. That is, even in the worst possible sequence of operations on the array, the amortized time per `append` call is constant: While any one call may take linear time, after that you can append in O(1) time so often that it'll balance out eventually. –  May 05 '13 at 18:26
  • @delnan: You are correct of course; I had my terminology wrong. Thank you. It's a good day when I learn something new. I may favourite this post just for the sake of your explanation of the terminology. – Pieter Geerkens May 05 '13 at 18:28
0

Random access memory is an array, so in that sense all data structures from linked-lists to heaps and beyond, that rely on random-access to memory for their performance behaviour, are built on the array that is system memory. It is more a question of how many-levels of abstraction are in between.

Of course in a modern virtual memory machine, the random-access system memory is itself an abstraction built on a complicated virtual-memory model of multi-tier pipelined caches, non-cached RAM, and disk.

Pieter Geerkens
  • 11,775
  • 2
  • 32
  • 52
  • The term you're looking for is *random access machine*, which is a model of computation (and yes, indeed the one to which the commonly-known bounds refer), not a data structure. And while the memory hierarchy has immense impacts on practical performance which the RAM model doesn't explain, the asymptotic bounds hold -- it's just one more thing which falls under constant factors. –  May 05 '13 at 20:11
  • @delnan: I think that's what I said; just not quite so precisely. Did I get anything incorrect, as opposed to imprecise? – Pieter Geerkens May 05 '13 at 20:20
  • I'm not enough of an expert in the field to decide if your terminology is flat-out wrong, but it is confusing matters IMHO (data structure v. model of computation; practical performance v. asymptotic complexity). The basic idea is correct in any case. I'm just nitpicking. –  May 05 '13 at 20:22
  • @delnan: Your statement is also aimed slightly different than mine. I will do some research myself to follow up. Thanks for the heads up. – Pieter Geerkens May 05 '13 at 20:24