18

Today, I faced a problem with performance while iterating through a list of items. After done some diagnostic, I finally figured out the reason which slowed down performance. It turned out that iterating through an IEnumerable<T> took much more time than iterating through a List<T>. Please help me understand why IEnumerable<T> is slower than List<T>.

UPDATE benchmark context:

I'm using NHibernate to fetch a collection of items from a database into an IEnumerable<T> and sum its property's value. This is just a simple entity without any reference type:

public SimpleEntity
{
    public int Id {get;set}
    public string Name {get;set}
    public decimal Price {get;set}
}

Public Test
{
    void Main()
    {
        //this query get a list of about 200 items
        IEnumerable<SimpleEntity> entities = from entity in Session.Query<SimpleEntity>
                                             select entity;

        decimal value = 0.0;
        foreach(SimpleEntity item in entities)
        {
             //this for loop took 1.5 seconds 
             value += item.Price;
        }

        List<SimpleEntity> lstEntities = entities.ToList();

        foreach(SimpleEntity item in lstEntities)
        {
             //this for loop took less than a milisecond
             value += item.Price;
        }
    }
}
Alberto Solano
  • 7,972
  • 3
  • 38
  • 61
Doan Cuong
  • 2,594
  • 4
  • 22
  • 39
  • That's probably because List would already be populated and an IEnumerable would be populated at the time when it will be needed.. Can you show some code? – bit May 08 '14 at 08:28
  • 4
    You really need to provide context or even the code you benchmarked because the idea that `IEnumerable` is slower than `List` solely depends on the situation. – James May 08 '14 at 08:28
  • 2
    We need to know what that `IEnumerable` is. For instance if `IEnumerable` is an array it won't hurt you, If it is a poor linq query it certainly can. – Sriram Sakthivel May 08 '14 at 08:30
  • `IEnumerable` is an interface implemented by many different collection types. How fast the enumeration is depends on the underlying collection type. – Rik May 08 '14 at 08:35
  • Yes,just like James said `depends on the situation`.Generally,IEnumerable loads data to memory before call methods like `Take` `Skip` and so on. IQueryable or List loads data to memory when it(`Take``Skip`) will be needed. – Rang May 08 '14 at 08:35
  • Question updated, please take a look – Doan Cuong May 08 '14 at 08:38
  • 2
    NHibernate explains your situation nicely. Please note my remarks about multiple iterations. You will have to iterate through the NHibernate enumerable at least once, and if you only need to do so once then converting the collection into a `List` is actually a false economy. – Rob Lyndon May 08 '14 at 08:49

5 Answers5

31

Enumerating an IEnumerable<T> is 2 to 3 times slower than enumerating the same List<T> directly. This is due to a subtlety on how C# selects its enumerator for a given type.

List<T> exposes 3 enumerators:

  1. List<T>.Enumerator List<T>.GetEnumerator()
  2. IEnumerator<T> IEnumerable<T>.GetEnumerator()
  3. IEnumerator IEnumerable.GetEnumerator()

When C# compiles a foreach loop, it will select the enumerator in the above order. Note that a type doesn't need to implement IEnumerable or IEnumerable<T> to be enumerable, it just needs a method named GetEnumerator() that returns an enumerator.

Now, List<T>.GetEnumerator() has the advantage of being statically typed which makes all calls to List<T>.Enumerator.get_Current and List<T>.Enumerator.MoveNext() static-bound instead of virtual.

10M iterations (coreclr):

for(int i ...)               73 ms
foreach(... List<T>)        215 ms
foreach(... IEnumerable<T>) 698 ms
foreach(... IEnumerable)   1028 ms
for(int *p ...)              50 ms

10M iterations (Framework):

for(int i ...)              210 ms
foreach(... List<T>)        252 ms
foreach(... IEnumerable<T>) 537 ms
foreach(... IEnumerable)    844 ms
for(int *p ...)             202 ms

Disclaimer

I should point out the actual iteration in a list is rarely the bottleneck. Keep in mind those are hundreds of milliseconds over millions of iterations. Any work in the loop more complicated than a few arithmetic operations will be overwhelmingly costlier than the iteration itself.

InteXX
  • 6,135
  • 6
  • 43
  • 80
Coincoin
  • 27,880
  • 7
  • 55
  • 76
  • 2
    Excellent answer. However, your disclaimer misses out on one very important point. The arithmetic operations will not necessarily be overwhelmingly costlier than the iteration if you iterate under **very heavy** memory pressure. In such case, the allocation of the enumerator actually becomes extremely significant as it puts additional pressure on the GC. – l33t Apr 30 '19 at 08:40
  • Although everything here (other than arguably the "2 to 3 times slower" as that will really depend on the size of the collection) is correct, it's unlikely to be significant in the question being asked. The performance difference caused in the question is *not* caused by the allocation of an iterator - it's caused by a difference in the actual type being iterated over. By the time the list is iterated over, it has all the relevant information... which is not the case for the query itself (the source of the first loop). – Jon Skeet Jan 04 '21 at 22:23
19

List<T> is an IEnumerable<T>. When you are iterating through your List<T>, you are performing the same sequence of operations as you are for any other IEnumerable<T>:

  • Get an IEnumerator<T>.
  • Invoke IEnumerator<T>.MoveNext() on your enumerator.
  • Take the IEnumerator<T>.Current element from the IEnumerator interface while MoveNext() returns true.
  • Dispose of the IEnumerator<T>.

What we know about List<T> is that it is an in-memory collection, so the MoveNext() function on its enumerator is going to be very cheap. It looks like your collection gives an enumerator whose MoveNext() method is more expensive, perhaps because it is interacting with some external resource such as a database connection.

When you call ToList() on your IEnumerable<T>, you are running a full iteration of your collection and loading all of the elements into memory with that iteration. This is worth doing if you expect to be iterating through the same collection multiple times. If you expect to iterate through the collection only once, then ToList() is a false economy: all it does is to create an in-memory collection that will later have to be garbage collected.

Rob Lyndon
  • 12,089
  • 5
  • 49
  • 74
4

List<T> is an implementation of IEnumerable<T> interface. To use the foreach syntax, you don't need a List<T> type or a IEnumerable<T> type, but you are required to use a type with a GetEnumerator() method. Quote from Microsoft docs:

The foreach statement isn't limited to those types. You can use it with an >instance of any type that satisfies the following conditions:

  • A type has the public parameterless GetEnumerator method whose return type is either class, struct, or interface type. Beginning with C# 9.0, the GetEnumerator method can be a type's extension method.
  • The return type of the GetEnumerator method has the public Current property and the public parameterless MoveNext method whose return type is Boolean.

Considering for example a LINQ context, performing a query, using an IEnumerable structure you have the advantange of a deferred execution of the query (the query will be executed only when needed), but, using the ToList() method, you're requesting that the query must be executed (or evaluated) immediately and you want your results in memory, saving them in a list, to perform later some operations on them, like changing some values.

About the performance, it depends on what you're trying to do. We don't know which operations you're performing (like fetching data from a database), which collection types you're using and so on.

UPDATE

The reason why you have a different timing between the IEnumerable collection iteration and the List collection iteration, is, like I said, that you have a deferred execution of the query when you're invoking:

IEnumerable<SimpleEntity> entities = from entity in Session.Query<SimpleEntity>
                                             select entity;

That means the query is executed only when you're iterating over the IEnumerable collection. This doesn't happen when you're calling the ToList() method in entities.ToList(); for the reasons I described above.

Alberto Solano
  • 7,972
  • 3
  • 38
  • 61
  • Your first statement is incorrect. `IEnumerable` has nothing to do with `foreach`. Using the `foreach` keyword merely requires you to fulfill a contract, as explained [here](https://ericlippert.com/2011/06/30/following-the-pattern/). – l33t Jan 14 '21 at 08:39
  • @l33t Thanks, you're right. I just updated my answer. – Alberto Solano Jan 18 '21 at 09:45
3

I believe it has nothing to do with IEnumerable. It's because on the first loop, when you are iterating over the IEnumerable, you are actually executing the query.

Which is completely different from the second case, when you would be executing the query here:

List<SimpleEntity> lstEntities = entities.ToList();

Making the iteration much faster because you are not actually querying the BD and transforming the result to a list while you are in the loop.

If you instead do this:

foreach(SimpleEntity item in entities.ToList())
{
    //this for loop took less than a milisecond
    value += item.Price;
}

Perhaps you would get a similar performance.

0

You are using linq.

IEnumerable<SimpleEntity> entities = from entity in Session.Query<SimpleEntity>
                                         select entity;

Justs declare the query. It will be executed when foreach gets the enumerator. The 1.5 seconds include the excution of Session.Query<>.

If you measure the line

List<SimpleEntity> lstEntities = entities.ToList();

You should get the 1.5 seconds or at least more than 1 second.

Are you sure your measures are being taken correctly? You should mesaure the second loop including entites.ToList().

Cheers!

MarianoC
  • 31
  • 2