5

What's the best practice to check if a collection has items?

Here's an example of what I have:

var terminalsToSync = TerminalAction.GetAllTerminals();

if(terminalsToSync.Any())
    SyncTerminals(terminalsToSync);
else
    GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);

The GetAllTerminals() method will execute a stored procedure and, if we return a result, (Any() is true), SyncTerminals() will loop through the elements; thus enumerating it again and executing the stored procedure for the second time.

What's the best way to avoid this?

I'd like a good solution that can be used in other cases too; possibly without converting it to List.

Thanks in advance.

Samuel Slade
  • 8,405
  • 6
  • 33
  • 55
Fedor Hajdu
  • 4,657
  • 3
  • 32
  • 50
  • why do you need execute the store proc the second time? terminalsToSync already contains the SP result isn't it? – Massimiliano Peluso Feb 13 '12 at 09:42
  • I do not need, nor want to execute it for the second time. What I said in my question was that since GetAllTerminals() returns IEnumerable both Any() and foreach will call it and the procedure will end up called twice. I'd like to avoid that behavior and still be able to check if there are any rows. – Fedor Hajdu Feb 13 '12 at 09:43
  • See Marc Gravell @ http://stackoverflow.com/questions/5047349/how-to-check-if-ienumerable-is-null-or-empty ?? – SpaceBison Feb 13 '12 at 09:43
  • 3
    why **specifically** do you want to avoid using a list, which is a simple and effective approach here? – Marc Gravell Feb 13 '12 at 09:44
  • @SpaceBison no, that will still execute twice – Marc Gravell Feb 13 '12 at 09:45
  • could GetAllTerminals use an out parameter or tuple to let you know the number of records returned? (or maybe null when it's empty?) – Dave Cousineau Feb 13 '12 at 09:45
  • @MarcGravell I don't have to do it, I was just searching for a generic solution that can be used in other places too. – Fedor Hajdu Feb 13 '12 at 09:48
  • 2
    @MarcGravell: since OP is asking for "best practices", it would be worth mentioning the drawbacks of copying a large number of results to a list. – vgru Feb 13 '12 at 09:51
  • @Groo as opposed to? Where do you suppose they would be if they weren't in a list? – MattDavey Feb 13 '12 at 09:56
  • Added a simpler way of looking at the problem... – Marc Gravell Feb 13 '12 at 10:01
  • @MattDavey: when reading from a database, results are usually returned in a sequential, non cached manner (an implementation of `IEnumerable`). If this weren't the case, each query would need to allocate a large amount of memory to store entire results. With `IEnumerable`, you are always fetching only one result at a time. That's also the philosophy behind a `DataReader`: you only read one element at a time and then advance to the next one. – vgru Feb 13 '12 at 10:03
  • @MattDavey: Of course, you will usually want to make your SQL queries such that they return a smallest set of elements in the first place (you won't probably be filtering or summing elements in your code), but imagine what would happen if you wanted to export a DB table to a csv file? Databases can handle lots of data, disk files can too, but memory if always limited. – vgru Feb 13 '12 at 10:04
  • @Groo yes this is true if you're using frameworks such as LinqSql or EntityFramework. But sooner or later you're going to have to copy the results into an in-memory data structure of some kind - it's prudent to take control over when that happens. In this particular question we don't know that the result from GetAllTerminals is even deferred. It could be an array for all we know.. – MattDavey Feb 13 '12 at 10:08
  • @MattDavey: it isn't strictly related to ORM frameworks. For any properly implemented DAL, result of a query will most likely never be an array (nor a list). As I've mentioned, plain `DataReader` behaves the same way for this same reason. For small sets of data, you can get away with copying it all into memory, but if `SyncTerminals` needs only to loop through the elements without actually displaying them all at once, then it is unnecessary to store them all in memory. – vgru Feb 13 '12 at 10:56

8 Answers8

5

I would probably use a ToArray call, and then check Length; you're going to enumerate all the results anyway so why not do it early? However, since you've said you want to avoid early realisation of the enumerable...

I'm guessing that SyncTerminals has a foreach, in which case you can write it something like this:

bool any = false;
foreach(var terminal in terminalsToSync)
{
  if(!any)any = true;
  //....
}

if(!any)
  GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);

Okay, there's a redundant if after the first loop, but I'm guessing the cost of an extra few CPU cycles isn't going to matter much.

Equally, you could do the iteration the old way and use a do...while loop and GetEnumerator; taking the first iteration out of the loop; that way there are literally no wasted operations:

var enumerator = terminalsToSync.GetEnumerator();
if(enumerator.MoveNext())
{
  do
  {
    //sync enumerator.Current
  } while(enumerator.MoveNext())
}
else
  GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
Andras Zoltan
  • 41,961
  • 13
  • 104
  • 160
  • @MattDavey - Of course it is; the best solution is to use `ToArray()`, which other answers have said, there really is no legitimate argument against it. But I'm providing an answer within the restrictions given by the question. – Andras Zoltan Feb 13 '12 at 09:58
  • +1 for first sentence - I cannot fathom why the OP is so opposed to this approach. – MattDavey Feb 13 '12 at 09:58
  • 1
    @MattDavey - yeah; unless I suppose that enumerable has lots and lots of large objects. But even then it's tenuous. Perhaps it's a Windows Mobile or something. – Andras Zoltan Feb 13 '12 at 10:01
  • The mind boggles - the accepted answer does exactly the same thing but in a much more convoluted way. Bizarre. – MattDavey Feb 13 '12 at 10:12
  • @MattDavey - yeah I don't really understand it. But Marc's is generic which I missed as a 'nice-to-have'. – Andras Zoltan Feb 13 '12 at 11:20
3

Personally i wouldnt use an any here, foreach will simply not loop through any items if the collection is empty, so i would just do it like that. However i would recommend that you check for null.

If you do want to pre-enumerate the set use .ToArray() eg will only enumerate once:

var terminalsToSync = TerminalAction.GetAllTerminals().ToArray();

if(terminalsToSync.Any())
    SyncTerminals(terminalsToSync);
undefined
  • 33,537
  • 22
  • 129
  • 198
  • 1
    +1 for ToArray(). When working with enumerables it's important to know (and take control over) when the collection is stabilized. – MattDavey Feb 13 '12 at 09:50
3

How about this, which still defers execution, but buffers it once executed:

var terminalsToSync = TerminalAction.GetAllTerminals().Lazily();

with:

public static class LazyEnumerable {
    public static IEnumerable<T> Lazily<T>(this IEnumerable<T> source) {
        if (source is LazyWrapper<T>) return source;
        return new LazyWrapper<T>(source);
    }
    class LazyWrapper<T> : IEnumerable<T> {
        private IEnumerable<T> source;
        private bool executed;
        public LazyWrapper(IEnumerable<T> source) {
            if (source == null) throw new ArgumentNullException("source");
            this.source = source;
        }
        IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); }
        public IEnumerator<T> GetEnumerator() {
            if (!executed) {
                executed = true;
                source = source.ToList();
            }
            return source.GetEnumerator();
        }
    }
}
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • 3
    This isn't really any different from calling ToList() though! – MattDavey Feb 13 '12 at 10:11
  • 1
    @MattDavey the only difference is **when** it is invoked. Actually, personally I like to ***know*** when my data access happens, so yes: I'd be happy with a `ToList()` – Marc Gravell Feb 13 '12 at 10:17
  • This is not really lazy. A true lazy List example ca be seen [here](http://www.siepman.nl/blog/post/2013/10/09/LazyList-A-better-LINQ-result-cache-than-List.aspx). – Alex Siepman Oct 13 '13 at 08:31
  • @Alex "lazy" is an overloaded term that can mean different things. The minimum is that the data is not loaded *at all* until it is needed - so yes, the above is lazy. What you are referring to is how *much* of the data is loaded at once. Spooling that on-demand may be advantageous in some scenarios - in others it may be undesirable. Either way (load all at once, or spooled) it is still "lazy" – Marc Gravell Oct 13 '13 at 17:32
  • @MarcGravell I agree that Lazy can mean different things. In the LazyWrapper the difference with .ToList() is small if anything is done with the source. – Alex Siepman Oct 13 '13 at 17:54
2
var terminalsToSync = TerminalAction.GetAllTerminals().ToList();

if(terminalsToSync.Any())
    SyncTerminals(terminalsToSync);
else
    GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
Chris
  • 21
  • 1
  • 1
    Thanks for the answer, but I mentioned in the question that I would like to avoid converting it to list because I'm looking for a generic solution. – Fedor Hajdu Feb 13 '12 at 09:50
  • @FedorHajdu what exactly do you mean by this? How is converting it to a list not a generic solution? – MattDavey Feb 13 '12 at 09:55
  • In order to find out if terminalsToSync has any items in it you have to enumerate the enumerable first. – Chris Feb 13 '12 at 15:19
1

All the caching solutions here are caching all items when the first item is being retrieved. It it really lazy if you cache each single item while the items of the list are is iterated.

The difference can be seen in this example:

public class LazyListTest
{
    private int _count = 0;

    public void Test()
    {
        var numbers = Enumerable.Range(1, 40);
        var numbersQuery = numbers.Select(GetElement).ToLazyList(); // Cache lazy
        var total = numbersQuery.Take(3)
            .Concat(numbersQuery.Take(10))
            .Concat(numbersQuery.Take(3))
            .Sum();
        Console.WriteLine(_count);
    }

    private int GetElement(int value)
    {
        _count++;
        // Some slow stuff here...
        return value * 100;
    }
}

If you run the Test() method, the _count is only 10. Without caching it would be 16 and with .ToList() it would be 40!

An example of the implementation of LazyList can be found here.

Alex Siepman
  • 2,499
  • 23
  • 31
1

.Length or .Count is faster since it doesn't need to go through the GetEnumerator()/MoveNext()/Dispose() required by Any()

Massimiliano Peluso
  • 26,379
  • 6
  • 61
  • 70
  • +1 people often say that Any() is faster because it short-circuits, but this is not always the case as Count() can sometimes use the IList.Count property if the collection has been stabilised. Length is not available on IEnumerable. – MattDavey Feb 13 '12 at 09:52
  • afaik you can't call Count (property) on IEnumerable you can call Count() extension method but it'll be slower than Any() because it has to loop through entire collection and count the result then just compare that to 0 while Any() will return true after it hits the first element. – Fedor Hajdu Feb 13 '12 at 09:53
  • This is only true if results are first copied to a `List` or an `Array`, of course. @MattDavey: that's the wrong way to look at it. If you need to check if a collection has **any** elements, you need to use `Any`. For `IEnumerable`, `Count()` is a complete disaster, while for a `List`, `Any` will simply check its count and return immediately. If your code is decoupled, then your method accepting an `IEnumerable` cannot know which implementation will it receive. – vgru Feb 13 '12 at 09:53
  • @FedorHajdu this is not true. Count() does **not necessarily** loop through the collection at all. If the collection has been stabilised to a list or array, Count() will simply call IList.Count property. Any() on the other hands will **always** begin iterating through the collection. Any() is only faster if the collection has not been stabilised. – MattDavey Feb 13 '12 at 10:04
  • @Groo yes this is what I was trying to say by 'stabilised', I'm not sure if this is a universally recognised term :) – MattDavey Feb 13 '12 at 10:05
  • @MattDavey: but on one side you have: 1. `IEnumerable.Any()` -> **always** an O(1) operation, never slower, and 2. `IEnumerable.Count()` -> **sometimes** an O(1) operation, but potentially O(n). The performance cost of instantiating an `IEnumerator` for a `List.Any()` is so negligible that there is no practical speed gain, and I would consider it the latter approach at all. – vgru Feb 13 '12 at 10:11
  • I guess there's two wildly different approaches here. Like Marc, personally I like to know when the collection has been stabilized or not, otherwise you can't know for sure what's going on :) – MattDavey Feb 13 '12 at 10:24
1

Here's another way of approaching this problem:

int count = SyncTerminals(terminalsToSync);
if(count == 0)  GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);

where you change SyncTerminals to do:

int count = 0;
foreach(var obj in terminalsToSync) {
    count++;
    // some code
}
return count;

Nice and simple.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
0

If you're seeing two procedure calls for the evaluation of whatever GetAllTerminals() returns, this means that the procedure's result isn't being cached. Without knowing what data-access strategy you're using, this is quite hard to fix in a general way.

The simplest solution, as you've alluded, is to copy the result of the call before you perform any other operations. If you wanted to, you could neatly wrap this behaviour up in an IEnumerable<T> which executes the inner enumerable call just once:

public class CachedEnumerable<T> : IEnumerable<T>
{
    public CachedEnumerable<T>(IEnumerable<T> enumerable)
    {
        result = new Lazy<List<T>>(() => enumerable.ToList());
    }

    private Lazy<List<T>> result;

    public IEnumerator<T> GetEnumerator()
    {
        return this.result.Value.GetEnumerator();
    }

    System.Collections.IEnumerable GetEnumerator()
    {
        return this.GetEnumerator();
    }
}

Wrap the result in an instance of this type and it will not evaluate the inner enumerable multiple times.

Paul Turner
  • 38,949
  • 15
  • 102
  • 166