21

Profiling my C# application indicated that significant time is spent in List<T>.AddRange. Using Reflector to look at the code in this method indicated that it calls List<T>.InsertRange which is implemented as such:

public void InsertRange(int index, IEnumerable<T> collection)
{
    if (collection == null)
    {
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
    }
    if (index > this._size)
    {
        ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.index, ExceptionResource.ArgumentOutOfRange_Index);
    }
    ICollection<T> is2 = collection as ICollection<T>;
    if (is2 != null)
    {
        int count = is2.Count;
        if (count > 0)
        {
            this.EnsureCapacity(this._size + count);
            if (index < this._size)
            {
                Array.Copy(this._items, index, this._items, index + count, this._size - index);
            }
            if (this == is2)
            {
                Array.Copy(this._items, 0, this._items, index, index);
                Array.Copy(this._items, (int) (index + count), this._items, (int) (index * 2), (int) (this._size - index));
            }
            else
            {
                T[] array = new T[count];          // (*)
                is2.CopyTo(array, 0);              // (*)
                array.CopyTo(this._items, index);  // (*)
            }
            this._size += count;
        }
    }
    else
    {
        using (IEnumerator<T> enumerator = collection.GetEnumerator())
        {
            while (enumerator.MoveNext())
            {
                this.Insert(index++, enumerator.Current);
            }
        }
    }
    this._version++;
}

private T[] _items;

One can argue that the simplicity of the interface (only having one overload of InsertRange) justifies the performance overhead of runtime type cheching and casting. But what could be the reason behind the 3 lines I have indicated with (*) ? I think it could be rewritten to the faster alternative:

is2.CopyTo(this._items, index);

Do you see any reason for not using this simpler and apparently faster alternative?

Edit:

Thanks for the answers. So consensus opinion is that this is a protective measure against the input collection implementing the CopyTo in a defective/malicious manner. To me it seems like a overkill to constantly pay the price of 1) runtime type checking 2) dynamic allocation of the temporary array 3) double the copy operation, when all this could have been saved by defining 2 or a few more overloads of InsertRange, one getting IEnumerable as now, the second getting a List<T>, third getting T[]. The later two could have been implemented to run around twice as fast as in the current case.

Edit 2:

I did implement a class FastList, identical to List, except that it also provides an overload of AddRange which takes a T[] argument. This overload does not need the dynamic type verification, and double-copying of elements. I did profile this FastList.AddRange against List.AddRange by adding 4-byte arrays 1000 times to a list which was initially emtpy. My implementation beats the speed of standard List.AddRange with a factor of 9 (nine!). List.AddRange takes about 5% of runtime in one of the important usage scenarios of our application, replacing List with a class providing a faster AddRange could improve application runtime by 4%.

shojtsy
  • 1,710
  • 3
  • 17
  • 24
  • @shojtsy: make sure you catch my edit 2 :) – Sam Harwell Jan 24 '10 at 17:03
  • You used a worst case scenario for your test. Try inserting a single `int` with `Insert` instead of a 4-byte array with `InsertRange` and you'll get even more of a bump. – Sam Harwell Jan 29 '10 at 23:18
  • The tested scenario is representative of the actual usage in my application. The list is a byte stream where 4-10 byte arrays are being appended many times. I understand that passing a simple Array to AddRange is where the performance penalties of the standard implementation are most visible. That was exactly my point. – shojtsy Jan 30 '10 at 21:39

3 Answers3

12

They are preventing the implementation of ICollection<T> from accessing indices of the destination list outside the bounds of insertion. The implementation above results in an IndexOutOfBoundsException if a faulty (or "manipulative") implementation of CopyTo is called.

Keep in mind that T[].CopyTo is quite literally internally implemented as memcpy, so the performance overhead of adding that line is minute. When you have such a low cost of adding safety to a tremendous number of calls, you might as well do so.

Edit: The part I find strange is the fact that the call to ICollection<T>.CopyTo (copying to the temporary array) does not occur immediately following the call to EnsureCapacity. If it were moved to that location, then following any synchronous exception the list would remain unchanged. As-is, that condition only holds if the insertion happens at the end of the list. The reasoning here is:

  • All necessary allocation happens before altering the list elements.
  • The calls to Array.Copy cannot fail because
    • The memory is already allocated
    • The bounds are already checked
    • The element types of the source and destination arrays match
    • There is no "copy constructor" used like in C++ - it's just a memcpy
  • The only items that can throw an exception are the external call to ICollection.CopyTo and the allocations required for resizing the list and allocating the temporary array. If all three of these occur before moving elements for the insertion, the transaction to change the list cannot throw a synchronous exception.
  • Final note: This address strictly exceptional behavior - the above rationale does not add thread-safety.

Edit 2 (response to the OP's edit): Have you profiled this? You are making some bold claims that Microsoft should have chosen a more complicated API, so you should make sure you're correct in the assertions that the current method is slow. I've never had a problem with the performance of InsertRange, and I'm quite sure that any performance problems someone does face with it will be better resolved with an algorithm redesign than by reimplementing the dynamic list. Just so you don't take me as being harsh in a negative way, keep the following in mind:

  • I don't want can't stand people on my dev team that like to reinvent the square wheel.
  • I definitely want people on my team that care about potential performance issues, and ask questions about the side effects their code may have. This point wins out when present - but as long as people are asking questions I will drive them to turn their questions into solid answers. If you can show me that an application gains a significant advantage through what initially appears to be a bad idea, then that's just the way things go sometimes.
Sam Harwell
  • 97,721
  • 20
  • 209
  • 280
  • The CLR allocator is extremely fast, plus the temporary array is likely to be collected before reaching Gen 1 so it shouldn't cause a problem. – Sam Harwell Jan 23 '10 at 20:38
2

It's a good question, I'm struggling to come up with a reason. There's no hint in the Reference Source. One possibility is that they try to avoid a problem when the class that implements the ICollection<>.CopyTo() method objects against copying to a start index other than 0. Or as a security measure, preventing the collection from messing with the array elements it should not have access to.

Another one is that this is a counter-measure when the collection is used in thread-unsafe manner. If an item got added to the collection by another thread it will be the collection class' CopyTo() method that fails, not the Microsoft code. The right person will get the service call.

These are not great explanations.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • The members of `List` are explicitly stated to not be thread-safe. As long as `List` and the collection of items to be inserted is manually synchronized, the calls to `CopyTo` can never occur in a dangerous manner. – Sam Harwell Jan 23 '10 at 15:33
  • 1
    Yes, but that wasn't the point. The point was: who gets the phone call when it blows up. A very real concern for a company like Microsoft. This hotfix is a good example, it is actually an exception triggered by the client code forgetting to lock: http://support.microsoft.com/kb/831730 – Hans Passant Jan 23 '10 at 15:57
  • In the above case, neither the original code nor proposed change is more thread-safe than the other. One strange thing is the fact that the call to `ICollection.CopyTo` is not located immediately after the call to `EnsureCapacity` - if it were then upon any synchronous exception the list would remain unchanged by the operation. – Sam Harwell Jan 23 '10 at 17:23
  • It has nothing to do with thread-safety. It has to do with what fails when it is used in a thread-unsafe manner. The current List<> code makes the collection's CopyTo() fail with an exception. The proposed code corrupts the List<>. – Hans Passant Jan 23 '10 at 17:38
  • `EnsureCapacity` is not thread-safe and setting `Capacity` *does* allow downsizing the array. Depending on the insertion point, two threads calling `Insert` could actually result not only the elements being in the wrong position, but actually having `_size` being greater than `Capacity` at the end of the routine. Of all the places this can fail, it's less likely that the thread safety concern will impact a call to `CopyTo` than other state variables and the internal array. – Sam Harwell Jan 23 '10 at 19:10
0

There is a problem with your solution if you think about it for a minute, if you change the code in that way you are essentially giving the collection that should be added access to an internal datastructure.

This is not a good idea, for example if the author of the List datastructure figures out a better underlying structure to store the data than an array there is no way to change the implementation of List since all collection are expecting an array into the CopyTo function.

In essence you would be cementing the implementation of the List class, even though object oriented programming tells us that the internal implementation of a datastructure should be something that can be changed without breaking other code.

  • -1: He's talking about the existing internal implementation. That implementation already relies on an object implementing `ICollection` to properly implement the `CopyTo` member of that interface. The suggested alternative neither adds nor removes from that list of assumptions. – Sam Harwell Jan 23 '10 at 15:31
  • That is true but it does not change the fact that his suggestion still breaks the encapsulation if the List internal data structure as well as relying on a proper implementation of the incoming collection. There are no guarantees that the collection that is being added to the list has a good or fast implementation of CopyTo, even worse it might have an implementation that has now access to internal data of the list that it otherwise would not have and should not have and can use to steal data that it should not have access to. I believe that my point still stands. – Kjartan Þór Kjartansson Jan 23 '10 at 16:13
  • Your initial statement was correct, but the reasoning you gave in the 2nd & 3rd paragraphs wouldn't be reasons to come to that conclusion. Also, you have to work on the assume that the implementation of `ICollection.CopyTo` is correct - that's why standard interfaces exist. The reasoning is simply that one primary objective in managed VMs is to get an out-of-bounds exception for illegal memory access, and this is a cheap way to do so. – Sam Harwell Jan 23 '10 at 17:17
  • It's too bad the Framework doesn't provide some standard sealed `CopyableArray` class containing a `System.Array` reference and included a `CopyTo` method that would copy a specified range of that array to a destination array. Such a class would allow array-backed collections to let each other copy data efficiently without exposing references to their internal arrays (passing an array reference to a non-virtual Framework method which was guaranteed not to persist it wouldn't violate encapsulation). – supercat Mar 01 '13 at 20:46