4

What C# collections are best optimised for both zero-index insert and append to end?

I guess the easy one is LinkedList, but given my interest is in large byte-buffers, I suspect the memory cost per node may be prohibitively expensive for my needs.

My understanding is that a normal List has a buffer backing with a capacity usually larger than the actual data. When the data fills the buffer, a new buffer is created and the contents of the old are transferred to the new. That's great for appending to end, but horrible for growth on the beginning. My tests, appending/inserting a million random integers to a List:

  • List Append time 0.014sec.
  • List Zero-Insert time 173.343sec
Douglas
  • 53,759
  • 13
  • 140
  • 188
Meirion Hughes
  • 24,994
  • 12
  • 71
  • 122
  • A Queue inserts at one end and removes at the other end. A Deque has insertion and removal at both ends. There is no Deque in the standard library, but implementing one is not so difficult. (Java has the equivalent ArrayDeque) – Dennis_E Jan 28 '16 at 15:14
  • ahh ok, didn't knew about that one. I presume then in a deque the data sits in the middle of the capacity buffer. – Meirion Hughes Jan 28 '16 at 15:19
  • A Deque works the same as a Queue, just as both ends. The data simply "wraps around". It can sit anywhere. – Dennis_E Jan 28 '16 at 15:22

5 Answers5

6

A data structure which has the form of a list but is optimized for insertion and removal on both ends -- but not the middle -- is called a deque, which is short for "Double Ended QUEue". I do not know whether the standard library contains a deque these days.

If you are interested in techniques for constructing immutable deques, I suggest you read, in order of increasing complexity:

Should you wish to create a mutable deque it is pretty straightforward to do so using the same techniques that a list. A list is just a wrapper around an array where the array is "too big" on the far end. To make a mutable deque you can make an array that is too big but the data sits in the middle, not at the small end. You just have to keep track of what the top and bottom indices of the data are, and when you bump up against either end of the array, re-allocate the array and copy the data to the middle.

Community
  • 1
  • 1
Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • Nice answer, good to see someone mention an immutable solution using finger trees. – TheInnerLight Jan 28 '16 at 15:25
  • 1
    A common variation on this approach is to have an additional private `first` variable. To insert at the beginning of the list, you increment `first`. The indexing operator would then return `(first+index)%capacity`. This approach is more complex than Eric's suggestion, but avoids unnecessary allocations when the list still has unfilled data slots. – Brian Jan 28 '16 at 21:09
  • Rather, you decrement first. – Brian Jan 28 '16 at 21:16
  • @Brian: Isn't equivalent to a circular buffer where we just prevent from overwriting existing data by allocating a bigger one when needed? That's what I was thinking as a solution. – Luca Cremonesi Jan 29 '16 at 18:47
  • @LucaCremonesi: Yes. – Brian Feb 01 '16 at 12:58
3

You could create your own IList<T> implementation that contains two lists: one for items added to the front (stored in reverse order), and one for items added to the back (in proper order). This would ensure that all your inserts to the very beginning or end of the list are O(1) (except for the few cases where the capacity needs to be increased).

public class TwoEndedList<T> : IList<T>
{
    private readonly List<T> front = new List<T>();
    private readonly List<T> back = new List<T>();

    public void Add(T item)
    {
        back.Add(item);
    }

    public void Insert(int index, T item)
    {
        if (index == 0)
            front.Add(item);
        else if (index < front.Count)
            front.Insert(front.Count - index, item);
        else
            back.Insert(index - front.Count, item);
    }

    public IEnumerator<T> GetEnumerator()
    {
        return front.Reverse<T>().Concat(back).GetEnumerator();
    }

    // rest of implementation
}
Douglas
  • 53,759
  • 13
  • 140
  • 188
2

My solution is as Eric Lippert mentioned; keep the real data floating in the middle of the backing-buffer, rather than at the beginning. However, this is still a list, not a queue; can still Add/Remove/Replace everywhere.

public class BList<T> : IList<T>, IReadOnlyList<T>
{
    private const int InitialCapacity = 16;
    private const int InitialOffset = 8;

    private int _size;
    private int _offset;
    private int _capacity;
    private int _version;

    private T[] _items;

    public BList()
    {
        _version = 0;
        _size = 0;
        _offset = InitialOffset;
        _capacity = InitialCapacity;
        _items = new T[InitialCapacity];
    }

    public BList(int initialCapacity)
    {
        _size = 0;
        _version = 0;
        _offset = initialCapacity/2;
        _capacity = initialCapacity;
        _items = new T[initialCapacity];
    }

    public void Insert(int insertIndex, T item)
    {
        if (insertIndex < 0)
            throw new ArgumentOutOfRangeException(nameof(insertIndex));

        var padRight = Math.Max(0, (insertIndex == 0) ? 0 : (insertIndex + 1) - _size);
        var padLeft = insertIndex == 0 ? 1 : 0;

        var requiresResize = _offset - padLeft <= 0 ||
                                _offset + _size + padRight >= _capacity;
        if (requiresResize)
        {
            var newSize = _size + 1;
            var newCapacity = Math.Max(newSize, _capacity * 2);
            var newOffset = (newCapacity / 2) - (newSize / 2) - padLeft;
            var newItems = new T[newCapacity];

            Array.Copy(_items, _offset, newItems, newOffset, insertIndex);
            Array.Copy(_items, _offset, newItems, newOffset + 1, _size - insertIndex);

            newItems[newOffset + insertIndex] = item;

            _items = newItems;
            _offset = newOffset;
            _size = newSize;
            _capacity = newCapacity;
        }
        else
        {
            if (insertIndex == 0)
                _offset = _offset - 1;
            else if (insertIndex < _size)
                Array.Copy(_items, _offset + insertIndex, _items, _offset + insertIndex + 1, _size - insertIndex);

            _items[_offset + insertIndex] = item;

            _size = _size + 1;
        }

        _version++;
    }

Full code

Test insert Count : 131072

enter image description here

Meirion Hughes
  • 24,994
  • 12
  • 71
  • 122
  • Accepted my own answer purely because it solved my explicit needs (List optimised for add-first and add-last, but still entirely editable and accessible) – Meirion Hughes Feb 19 '16 at 09:29
1

The reason lists (and arrays in general) are bad for insertion, is that all elements after the insertion index have to be moved up.

The best option for inserting at the start, is a linked list, since you only have to change the references to the first item, the previous item (on the original first item) and the next item. That are simple operations.

The problem with linked list insertion halfway, is that you have to walk the entire list in order to get the item at position X. That isn't really working for large linked lists. If you have a lot of insertion at position 0 to do, you could try (if possible) to reverse the list, that you can insert 'to the end'.

Appending on a list just sets the element at a specific index and increases the 'last index used' field. The problem with those is resizing of the backing array, but you can set a capacity on the list large enough to allow all items to be set without reallocating and moving.

Patrick Hofman
  • 153,850
  • 22
  • 249
  • 325
0

A double-ended queue is, as others have noted, what you're looking for. There is a decent list-based implementation here, which supports enqueueing/dequeuing/peeking etc. as you'd expect from a proper queue, and at both ends, whilst supporting all generic list operations as well:

IDoubleEndedQueue

DoubleEndedQueue

ConcurrentDoubleEndedQueue (if thread safety is desired)

As others have mentioned, when you implement a deque with an underlying list, insertions/removals in the middle require an array shift (the pain of which may be slightly mitigated by being able to shift in a shorter direction than the worst-case situation of insertion/removal at index 0 with a traditional list).

Iucounu
  • 1,630
  • 1
  • 20
  • 31