Why is the c# generic List implemented as an append-only array?

Question

The C# generic list System.Collections.Generic.List<T> is implemented using a grow-able array as the backing store, in a manner similar of more array list based implementations. It is clear that this provides huge benefits when performing random (indexed) access over e.g. lists that are implemented as linked list.

I am wondering however why the choice was not made to implement it as a circular array. Such an implementation would have the same O(1) performance for indexed random access and appending to the end of the list. But would provide additional benefits, such as allowing O(1) prepend oprations (i.e. inserting new element at the front of the list) and on average halving the amount of time needed for random insertions and deletions.

Summary from some answers so far

As pointed out by @Hachet, in order for a circular array implementation to have indexing performance that is similar to that of the System.Collections.Generic.List<T> it would be required that it always grows to a capacity that is a power of 2 (so that a cheap modulus operation can be performed). This would mean that it is not possible to size it to an exact user supplied initial capacity as is currently possibly on construction of a list instance. So that is a clear trade off question.

And as shown by some quick & dirty performance tests, indexing may be around 2-5 x slower for a circular array implementation.

With indexing being an obvious priority I can imagine this would be too big of penalty to pay versus better performance on prepend operations and slightly better performance on random inserts/deletions.

This is a duplicate with some complementary answer information

This question is indeed related to Why typical Array List implementations aren't double-ended?, which I did not discover before posting my question. It was not answered in a manner that was fully satisfactory I guess:

I haven't done any benchmarks, but it just seemed to me that there would be other bottlenecks (e.g. nonlocal loads/stores) that would beat this by quite a lot. I'll probably accept this if I don't hear something more convincing though, thanks. – Mehrdad May 27 '11 at 4:18

The answers on this question provide additional information on how indexing for a circular list could be made to perform pretty well, including a code example, and some quantitative numbers numbers that make the trade-off decisions being made much clearer. As such they provide information that is complementary to that which is present in the other question. So I agree that the intent of this question was very much the same, and I agree it should therefore be considered a duplicate. It would be a shame however if the new information now accompanying this one would be lost.

Also I am still interested in potential additional reasons for the implementation choice that may not yet be present in the answers on either question.

It's good to have people like Eric Lippert and Jon Skeet here. — Federico Berasategui, Aug 30 '13 at 22:39
Not sure how you plan to have both append and prepend not to require list reallocation (i.e. append-prepend-append sequence)? There is already `Queue`, `LinkeList` (and maybe `Stack`) that are likely more suitable for use cases when one need access to both ends of list. — Alexei Levenkov, Aug 30 '13 at 23:11
@AlexeiLevenkov you wrap around to the end of the allocated array (i.e. you maintain a "head" and "tail" index, or a "head" and "count" index). When prepending you decrease the head and increase the count, when appending you only increase the count. — Alex, Aug 30 '13 at 23:13

JaredPar · Answer 1 · 2013-08-30T23:29:15.037

Indeed a circular array could be implemented with O(1) access time. However I do not believe the intent of List<T> indexer was to be O(1). Big O notation tracks performance as it relates to the size of the collection it operates on. The implementors of List<T>, in addition to wanting O(1), were likely obsessed with other items like instruction count and performance in tight loops. It would need to be as close to array performance in the same scenarios in order to be generally useful. Accessing an element of an array is a very simple operation

Add index multiplied by size to array start
Deference

Indexing in a circular array, while still O(1), involves at least one branch operation. It has to check if the array index needs to wrap around or not based on the desired index. This means that tight arrays over loops with known bounds would have branching logic inside of them. This would be a significant downstep in code paths with fast tight loops over a raw array.

EDIT

Ah yes, a branch wouldn't necessarily be needed. However the instruction count would still be higher than that of an array. I believe that would still factor into the concerns of the authors.

Additionally another issue would be whether or not prepend was a priority operation. If prepend wasn't considered a priority then why take any performance hit in a scenario that was (indexing was certainly a priority)? My guess is that indexing, enumerating and adding to the end of the array were the scenarios given the highest priority. Operations like prepend were likely considered rare.

Indexing elements in a circular array can be achieved without branching by performing a modulus operation (preferably implemented without performing a division). You would need 2 memmove operations in some cases, but the amount to move would be significantly less. — Alex, Aug 30 '13 at 22:39
@Alex - List<> access is very fast, nearly as fast as plain array access. Adding modulus to that would be a significant hit to performance unless you could guarantee list sizes that are powers of two. — hatchet - done with SOverflow, Aug 30 '13 at 22:41
@hatchet agreed, when you grow you should always grow to a power of 2. — Alex, Aug 30 '13 at 22:43
But there is currently no such constraint on List<>. You can make its initial size be exactly what you want it to be. — hatchet - done with SOverflow, Aug 30 '13 at 22:44
@hatchet true, that would be a trade off that could lead to deciding to implement it the way it is. On the other hand, once it needs to grow beyond what you specified as the initial capacity, it will use its own "ideal" capacity determination. — Alex, Aug 30 '13 at 22:50

score 1 · Accepted Answer · answered Aug 30 '13 at 23:58

Your question made me curious what the magnitude of runtime difference would be between List<> and a circular version. So I threw together a quick skeleton on one that forces sizes that are powers of two (which avoids modulo operations), i.e. a best case implementation. I've left out growing because I just wanted to compare the indexer property time differences. It wasn't too bad; circularList[x] was about twice as slow as list[x], and both are quite fast. I also haven't really debugged this because I had limited time. This also probably lacks some validation code that List<> is doing, which would make the circular list comparatively a bit slower if it also had it.

In general, I would say this behavior would just be a distraction to the primary purpose of List<>, that would only rarely actually be used. So you are forcing slower performance on very many uses, to benefit a very small number of uses. I think they made a good decision in not baking it into List<>.

using System;

public class CircularList<T> {
    private int start, end, count, mask;
    private T[] items;
    public CircularList() : this(8) { }

    public CircularList(int capacity) {
        int size = IsPowerOf2(capacity) ? capacity : PowerOf2Ceiling(capacity);
        this.items = new T[size];
        this.start = this.end = this.count = 0;
        this.mask = size - 1;
    }

    public void Add(T item) {
        if (this.count == 0) {
            this.items[0] = item;
            this.start = this.end = 0;
        } else {
            this.items[++this.end] = item;
        }
        this.count++;
    }

    public void Prepend(T item) {
        if (this.count == 0) {
            this.items[0] = item;
            this.start = this.end = 0;
        } else {
            this.start--;
            if (this.start < 0) this.start = this.items.Length - 1;
            this.items[this.start] = item;
        }
        this.count++;
    }

    public T this[int index] {
        get {
            if ((index < 0) || (index >= this.count)) throw new ArgumentOutOfRangeException();
            return this.items[(index + this.start) & this.mask]; // (index + start) % length
        }
        set {
            if ((index < 0) || (index >= this.count)) throw new ArgumentOutOfRangeException();
            this.items[(index + this.start) & this.mask] = value; // (index + start) % length
        }
    }

    private bool IsPowerOf2(int value) {
        return (value > 0) && ((value & (value - 1)) == 0);
    }
    private int PowerOf2Ceiling(int value) {
        if (value < 0) return 1;
        switch (value) {
            case 0:
            case 1: return 1;
        }
        value--;
        value |= value >> 1;
        value |= value >> 2;
        value |= value >> 4;
        value |= value >> 8;
        return unchecked((value | (value >> 16)) + 1);
    }
}

Thanks. Yes, it is clear that the current list implementation is the optimal choice for "append-only" write scenarios, and scenarios where there are limited writes and many reads. And those certainly are common scenarios. So perhaps that is indeed why the decision was made in this way. — Alex, Aug 31 '13 at 00:11
Oh, and I'd say the validation that it is doing is quite similar (see http://www.dotnetframework.org/default.aspx/4@0/4@0/DEVDIV_TFS/Dev10/Releases/RTMRel/ndp/clr/src/BCL/System/Collections/Generic/List@cs/1305376/List@cs). It appears you can improve the performance of the validation check in your code example above by using `if ((uint)index >= (uint)this.count)` — Alex, Aug 31 '13 at 00:29
I just did some performance testing with a circular list implementation I whipped up as well (similar in approach to your code here), and on my system with .NET 4.5 performance of the circular list is almost as fast as that of list. Using 100 million ints: Add = 0.6608 vs. 0.735 seconds (factor 1.11), for sequential indexing: 0.3732 vs. 0.3811 seconds (factor 1.02). In that case it does not seem like a big sacrifice for having much faster prepends and random inserts. — Alex, Aug 31 '13 at 01:21
Well that certainly made a difference :D Add performance still looks ok (but then again for sequential inserts it is basically doing the same as list): 5.732 ns/op vs. 7.434 ns/op (factor 1.3). The sequential indexing makes quite a difference though: 0.666 vs. 3.261 ns/op (factor 4.9) — Alex, Aug 31 '13 at 01:55

Why is the c# generic List implemented as an append-only array?

2 Answers2