12

When given an d you could be dealing with a fixed sequence like a list or array, an AST that will enumerate some external datasource, or even an AST on some existing collection. Is there a way to safely "materialize" the enumerable so that enumeration operations like foreach, count, etc. don't execute the AST each time?

I've often used .ToArray() to create this represenation but if the underlying storage is already a list or other fixed sequence, that seems like wasted copying. It would be nice if i could do

var enumerable = someEnumerable.Materialize();

if(enumberable.Any() {
  foreach(var item in enumerable) {
    ...
  }
} else {
  ...
}

Without having to worry that .Any() and foreach try to enumerate the sequence twice and without it unccessarily copying the enumerable.

Arne Claassen
  • 14,088
  • 5
  • 67
  • 106
  • 1
    This is a nice idea, but I would point out that often, existingCollection.ToList is done to protect against mutations to the existing collection. – Ani Jan 07 '11 at 02:10
  • 1
    The issue with .ToList() is that it will create a list of enumerables that aren't lists (arrays, ICollections, etc.) and return a mutable collection. – Arne Claassen Jan 07 '11 at 14:48

3 Answers3

11

Original answer:

Same as Thomas's answer, just a bit better according to me:

public static ICollection<T> Materialize<T>(this IEnumerable<T> source)
{
    // Null check...
    return source as ICollection<T> ?? source.ToList();
}

Please note that this tend to return the existing collection itself if its a valid collection type, or produces a new collection otherwise. While the two are subtly different, I don't think it could be an issue.


Edit:

Today this is a better solution:

public static IReadOnlyCollection<T> Materialize<T>(this IEnumerable<T> source)
{
    // Null check...
    switch (source)
    {
        case IReadOnlyCollection<T> readOnlyCollection:
            return readOnlyCollection;

        case ICollection<T> collection:
            return new ReadOnlyCollectionAdapter<T>(collection);

        default:
            return source.ToList();
    }
}

public class ReadOnlyCollectionAdapter<T> : IReadOnlyCollection<T>
{
    readonly ICollection<T> m_source;

    public ReadOnlyCollectionAdapter(ICollection<T> source) => m_source = source;

    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();

    public int Count => m_source.Count;

    public IEnumerator<T> GetEnumerator() => m_source.GetEnumerator();
}

Mind you the above solution misses a certain covariant case where the collection type implements ICollection<T> but not IReadOnlyCollection<T>. For e.g. consider you have a collection like below:

class Collection<T> : ICollection<T>
{
}

// and then
IEnumerable<object> items = new Collection<Random>();

The above compiles since IEnumerable<T> is covariant.

// later at some point if you do
IReadOnlyCollection<object> materialized = items.Materialize();

The above code creates a new List<Random> (O(N)), even though we passed an already materialized collection. The reason is ICollection<T> is not a covariant interface (it can't be), hence our cast from Collection<Random> to ICollection<object> fails, so the default: case in the switch is executed.

I believe it is an extremely rare scenario for a collection type to implement ICollection<T> but not IReadOnlyCollection<T>. I would just ignore that case. Scanning BCL libraries I could find only very few and that too hardly heard of. If at all you need to cover that case as well, you could use some reflection. Something like:

public static IReadOnlyCollection<T> Materialize<T>(this IEnumerable<T> source)
{
    // Null check...

    if (source is IReadOnlyCollection<T> readOnlyCollection)
        return readOnlyCollection;

    if (source is ICollection<T> collection)
        return new ReadOnlyCollectionAdapter<T>(collection);
    
    // Use your type checking logic here.
    if (source.GetType() (is some kind of typeof(ICollection<>))
        return new EnumerableAdapter<T>(source);

    return source.ToList();
}

public class EnumerableAdapter<T> : IReadOnlyCollection<T>
{
    readonly IEnumerable<T> m_source;

    public EnumerableAdapter(IEnumerable<T> source) => m_source = source;

    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();

    public int Count => ((dynamic)m_source).Count;

    public IEnumerator<T> GetEnumerator() => m_source.GetEnumerator();
}
nawfal
  • 70,104
  • 56
  • 326
  • 368
11

Easy enough:

public static IList<TSource> Materialize<TSource>(this IEnumerable<TSource> source)
{
    if (source is IList<TSource>)
    {
        // Already a list, use it as is
        return (IList<TSource>)source;
    }
    else
    {
        // Not a list, materialize it to a list
        return source.ToList();
    }
}
Thomas Levesque
  • 286,951
  • 70
  • 623
  • 758
  • 4
    This is a good approach. I think it would be better to return an `IEnumerable` instead, and also check for `ICollection` and `ICollection`. – Ani Jan 07 '11 at 02:13
  • 6
    This is subtly different from the Linq.ToList() implementation which appears to always return a new copy so changes to the result don't change the original. Materialize as written will, depending on the type of input, sometimes return a copy and sometimes return the original - so changes to the result sometimes change the original. – Handcraftsman Jan 07 '11 at 13:52
  • Ani's got the right idea. My intention is not to create a mutable list, just an `IEnumerable` that is safe and efficient to enumerate multiple times. Also, while I have never tested it, i assume that ToArray() is the cheaper fallback materializer. – Arne Claassen Jan 07 '11 at 14:53
  • @Handcraftsman - are you sure .ToList() always returns a new list? I thought it specifically didn't unless it had to. MSDN does not specify the behavior and I haven't cracked open Resharper to be sure. – Arne Claassen Jan 07 '11 at 14:56
  • @Arne I can't say for certain without access to the code but a little experimentation https://github.com/handcraftsman/Scratch/blob/master/src/Scratch/LinqToListBehavior/Tests.cs with ToList() appears to verify that behavior. – Handcraftsman Jan 07 '11 at 15:04
  • 1
    Actually, I'm not sure ToArray is cheaper than ToList... they do mostly the same work, but ToArray has to trim the excess size, while ToList doesn't (because it tracks the capacity and count separately). – Thomas Levesque Jan 07 '11 at 15:42
  • 1
    And yes, `ToList()` always return a new instance of `List`. Look at the code in Reflector: it just passes the source sequence to the constructor of `List` – Thomas Levesque Jan 07 '11 at 15:43
  • @Thomas - yeah, just looked at it in Reflector. Well, should have done that a long time ago :) – Arne Claassen Jan 07 '11 at 16:58
  • 3
    Or in one line, `return source as IList ?? source.ToList();` – nawfal Sep 28 '13 at 13:59
3

Check out this blog post I wrote a couple of years ago: http://www.fallingcanbedeadly.com/posts/crazy-extention-methods-tolazylist

In it, I define a method called ToLazyList that effectively does what you're looking for.

As written, it will eventually make a full copy of the input sequence, although you could tweak it so that instances of IList don't get wrapped in a LazyList, which would prevent this from happening (this action, however, would carry with it the assumption that any IList you get is already effectively memoized).

johnnyRose
  • 7,310
  • 17
  • 40
  • 61
Amanda Mitchell
  • 2,665
  • 1
  • 16
  • 23
  • 2
    That's a really interesting extension, but I don't think it's related to what the OP wants. That *defers* the materialization of the sequence on a need-basis, whereas the OP wants to *eagerly* materialize the sequence in an efficient manner; getting a reference to an existing collection if need be. – Ani Jan 07 '11 at 02:08