33

I'm using .Net 3.5 (C#) and I've heard the performance of C# List<T>.ToArray is "bad", since it memory copies for all elements to form a new array. Is that true?

Sam
  • 7,252
  • 16
  • 46
  • 65
George2
  • 44,761
  • 110
  • 317
  • 455
  • You might want to see [is-it-better-to-call-tolist-or-toarray-in-linq-queries](http://stackoverflow.com/questions/1105990/is-it-better-to-call-tolist-or-toarray-in-linq-queries) – nawfal Nov 10 '13 at 21:06

7 Answers7

57

No that's not true. Performance is good since all it does is memory copy all elements (*) to form a new array.

Of course it depends on what you define as "good" or "bad" performance.

(*) references for reference types, values for value types.

EDIT

In response to your comment, using Reflector is a good way to check the implementation (see below). Or just think for a couple of minutes about how you would implement it, and take it on trust that Microsoft's engineers won't come up with a worse solution.

public T[] ToArray()
{
    T[] destinationArray = new T[this._size];
    Array.Copy(this._items, 0, destinationArray, 0, this._size);
    return destinationArray;
}

Of course, "good" or "bad" performance only has a meaning relative to some alternative. If in your specific case, there is an alternative technique to achieve your goal that is measurably faster, then you can consider performance to be "bad". If there is no such alternative, then performance is "good" (or "good enough").

EDIT 2

In response to the comment: "No re-construction of objects?" :

No reconstruction for reference types. For value types the values are copied, which could loosely be described as reconstruction.

Joe
  • 122,218
  • 32
  • 205
  • 338
  • 2
    Thanks Joe, your answer is so cool! Do you have any related documents to discuss further or prove further of the claim -- "all it does is memory copy all elements (*) to form a new array."? – George2 Jul 18 '09 at 13:19
  • Thank Joe, Array.Copy only copy reference? No re-construction of objects? – George2 Jul 18 '09 at 13:57
  • 1
    George. Go look it up! Or go use Reflector and find out. It wasn't so complex for ToArray, was it? – John Saunders Jul 18 '09 at 16:21
  • Thanks John and Joe! My question is answered. – George2 Jul 18 '09 at 17:58
  • 1
    Note: This is the List.ToArray() implementation, not the Enumerable.ToArray(..) extension. – Curtis Yallop Jan 25 '16 at 23:50
  • @CurtisYallop - true, though the `Enumerable.ToArray` implementation is not that much different if the enumerable is a list or collection. – Joe Jan 26 '16 at 10:12
  • "... take it on trust that Microsoft's engineers won't come up with a worse solution" - Actually, I have seen many situations where they came up with a slow or ugly solution and I had to improve that myself. So, it is not the best idea to take all their actions on trust. – mas.morozov Feb 28 '20 at 16:11
23

Reasons to call ToArray()

  • If the returned value is not meant to be modified, returning it as an array makes that fact a bit clearer.
  • If the caller is expected to perform many non-sequential accesses to the data, there can be a performance benefit to an array over a List<>.
  • If you know you will need to pass the returned value to a third-party function that expects an array.
  • Compatibility with calling functions that need to work with .NET version 1 or 1.1. These versions don't have the List<> type (or any generic types, for that matter).

Reasons not to call ToArray()

  • If the caller ever does need to add or remove elements, a List<> is absolutely required.
  • The performance benefits are not necessarily guaranteed, especially if the caller is accessing the data in a sequential fashion. There is also the additional step of converting from List<> to array, which takes processing time.
  • The caller can always convert the list to an array themselves.

taken from here

M4N
  • 94,805
  • 45
  • 217
  • 260
Sorantis
  • 14,496
  • 5
  • 31
  • 37
  • 3
    Good refernece, but not direct answer to my question? What is your answer to my question? – George2 Jul 18 '09 at 13:11
  • 4
    It's the only answer we can give: Correctness always trumps performance. You don't the most performant thing you can that's still correct. The application of that is that you don't call .ToArray() unless you have to anyway. – Joel Coehoorn Jul 18 '09 at 13:46
  • 4
    "...there can be a performance benefit to an array over a List<>." - any evidence for this? Sounds like a myth to me. – Joe Jul 18 '09 at 13:49
  • 3
    Returning an array doesn't indicate that it can't be modified. The BCL is full of methods that return arrays and the recipient is quite free to modify the array. – Daniel Earwicker Jul 18 '09 at 13:53
  • 1
    .NET framework prior to 2.0 had non-generic collections, as well as arrays. – Daniel Earwicker Jul 18 '09 at 13:54
  • I just want to confirm only reference is copied during ToArray. No re-construction of objects during ToArray? – George2 Jul 18 '09 at 13:58
6

Yes, it's true that it does a memory copy of all elements. Is it a performance problem? That depends on your performance requirements.

A List contains an array internally to hold all the elements. The array grows if the capacity is no longer sufficient for the list. Any time that happens, the list will copy all elements into a new array. That happens all the time, and for most people that is no performance problem.

E.g. a list with a default constructor starts at capacity 16, and when you .Add() the 17th element, it creates a new array of size 32, copies the 16 old values and adds the 17th.

The size difference is also the reason why ToArray() returns a new array instance instead of passing the private reference.

chris166
  • 4,769
  • 4
  • 24
  • 25
  • Thanks chris166, I just want to confirm only reference is copied during ToArray. No re-construction of objects during ToArray? – George2 Jul 18 '09 at 13:56
  • 1
    Yes, only references are copied. The list doesn't know how to create a deep copy of your objects. The exception are value types (structs, ints, doubles, enums etc). – chris166 Jul 18 '09 at 14:44
2

This is what Microsoft's official documentation says about List.ToArray's time complexity

The elements are copied using Array.Copy, which is an O(n) operation, where n is Count.

Then, looking at Array.Copy, we see that it is usually not cloning the data but instead using references:

If sourceArray and destinationArray are both reference-type arrays or are both arrays of type Object, a shallow copy is performed. A shallow copy of an Array is a new Array containing references to the same elements as the original Array. The elements themselves or anything referenced by the elements are not copied. In contrast, a deep copy of an Array copies the elements and everything directly or indirectly referenced by the elements.

So in conclusion, this is a pretty efficient way of getting an array from a list.

J Gaspar
  • 21
  • 3
1

it creates new references in an array, but that's just the only thing that that method could and should do...

1

Performance has to be understood in relative terms. Converting an array to a List involves copying the array, and the cost of that will depend on the size of the array. But you have to compare that cost to other other things your program is doing. How did you obtain the information to put into the array in the first place? If it was by reading from the disk, or a network connection, or a database, then an array copy in memory is very unlikely to make a detectable difference to the time taken.

Daniel Earwicker
  • 114,894
  • 38
  • 205
  • 284
0

For any kind of List/ICollection where it knows the length, it can allocate an array of exactly the right size from the start.

T[] destinationArray = new T[this._size];
Array.Copy(this._items, 0, destinationArray, 0, this._size);
return destinationArray;

If your source type is IEnumerable (not a List/Collection) then the source is:

items = new TElement[4];
..
if (no more space) {
    TElement[] newItems = new TElement[checked(count * 2)];
    Array.Copy(items, 0, newItems, 0, count);
    items = newItems;

It starts at size 4 and grows exponentially, doubling each time it runs out of space. Each time it doubles, it has to reallocate memory and copy the data over.

If we know the source-data size, we can avoid this slight overhead. However in most cases eg array size <=1024, it will execute so quickly, that we don't even need to think about this implementation detail.

References: Enumerable.cs, List.cs (F12ing into them), Joe's answer

Curtis Yallop
  • 6,696
  • 3
  • 46
  • 36