17

Background:

I admit I did not attempt to benchmark this, but I'm curious...

What are the CPU/memory characteristics of the Enumerable.ToArray<T> (and its cousin Enumerable.ToList<T>)?

Since IEnumerable does not advertise in advance how many elements it has, I (perhaps naively) presume ToArray would have to "guess" an initial array size, and then to resize/reallocate the array if the first guess appears to be too small, then to resize it yet again if the second guess appears to be too small etc... Which would give worse-than-linear performance.

I can imagine better approaches involving (hybrid) lists, but this would still require more than one allocation (though not reallocation) and quite bit of copying, though it could be linear overall despite the overhead.

Question:

Is there any "magic" taking place behind the scenes, that avoids the need for this repetitive resizing, and makes ToArray linear in space and time?

More generally, is there an "official" documentation on BCL performance characteristics?

Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
  • 2
    Doesn't matter in small projects. Try to optimize client-side or database instead. :) – Saeed Neamati Jul 19 '11 at 16:17
  • 1
    Not a direct answer to your question, but you might find [this article](http://msmvps.com/blogs/jon_skeet/archive/2011/01/02/reimplementing-linq-to-objects-part-24-toarray.aspx) interesting (and the rest of the Edulinq series as well) – Thomas Levesque Jul 19 '11 at 16:24
  • 2
    I don’t agree that this question is a duplicate of "…better to call ToList or ToArray…" even though there is an important relationship. Both performance (this question) and memory use (other question) are interesting and nontrivial considerations. They can be described separately, but both should factor into a decision to choose one over the other. I cannot recommend any one of the answers to this or the other question as comprehensive. There are several answers that when taken together do provide a rather complete discussion of how to choose one over the other. – steve Feb 19 '16 at 21:05

3 Answers3

17

No magic. Resizing happens if required.

Note that it is not always required. If the IEnumerable<T> being .ToArrayed also implements ICollection<T>, then the .Count property is used to pre-allocate the array (making the algorithm linear in space and time.) If not, however, the following (rough) code is executed:

    foreach (TElement current in source)
    {
        if (array == null)
        {
            array = new TElement[4];
        }
        else
        {
            if (array.Length == num)
            {
                // Doubling happens *here*
                TElement[] array2 = new TElement[checked(num * 2)];
                Array.Copy(array, 0, array2, 0, num);
                array = array2;
            }
        }
        array[num] = current;
        num++;
    }

Note the doubling when the array fills.

Regardless, it's generally a good practice to avoid calling .ToArray() and .ToList() unless you absolute require it. Interrogating the query directly when needed is often a better choice.

dlev
  • 48,024
  • 5
  • 125
  • 132
8

I extracted the code behind .ToArray() method using .NET Reflector:

public static TSource[] ToArray<TSource>(this IEnumerable<TSource> source)
{
    if (source == null)
    {
        throw Error.ArgumentNull("source");
    }
    Buffer<TSource> buffer = new Buffer<TSource>(source);
    return buffer.ToArray();
}

and Buffer.ToArray:

internal TElement[] ToArray()
{
    if (this.count == 0)
    {
        return new TElement[0];
    }
    if (this.items.Length == this.count)
    {
        return this.items;
    }
    TElement[] destinationArray = new TElement[this.count];
    Array.Copy(this.items, 0, destinationArray, 0, this.count);
    return destinationArray;
}

And inside the Buffer constructor it loops through all elements to calculate the real Count and array of Elements.

Mo Valipour
  • 13,286
  • 12
  • 61
  • 87
  • 1
    Actually, Buffer does not calculate the count. It allocates and copies elements to increasingly larger arrays as it enumerates the items -- as described by @dlev. Personally, I wonder if ToArray would be faster if it enumerated the items two times -- first to get the count and then for a single allocate&copy. I guess that would depend on whether enumerating was costly. – steve Feb 19 '16 at 20:38
3

IIRC, it uses a doubling algorithm.

Remember that for most types, all you need to store are references. It's not like you're allocating enough memory to copy the entire object (unless of course you're using a lot of structs... tsk tsk).

It's still a good idea to avoid using .ToArray() or .ToList() until the last possible moment. Most of the time you can just keep using IEnumerable<T> all the way up until you either run a foreach loop or assign it to a data source.

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794