0

I believe the answer to this question is well explained here:LINQ Ring: Any() vs Contains() for Huge Collections

But my question is specific for the current implementation

IEnumerable<T> msgs = null;

/// ...
/// some method calls which returns a long list of messages
/// The return type of the method is IEnumerable<T>
/// List<T> ret = new List<T>();
/// ...
/// return ret
/// ...
if (msgs.Any())
    object= msgs.Last();

The msgs is an in memory collection (IEnumerable) said. How does Any() work here? There's no condition for this Any() method call, isn't it just O(1) instead? Or it still looks through each element?

DarkChocoBar
  • 77
  • 1
  • 12
  • 2
    If the `IEnumerable` is a `ICollection` it looks at the count, if its not, it will call `IEnumerator` `MoveNext` and return the result of that. There is no magic here – TheGeneral Jul 10 '21 at 00:05
  • 1
    What is the underlying type here? We can't give any specifics unless you give us specifics. Is it a `List` or a more complex thing like a DB query? – DavidG Jul 10 '21 at 00:25
  • @DavidG: OP said that it's an in memory collection, not a DB query, so this is Linq to Objects. But i'm sure it's not a collection, otherwise `Any` and `Last` would be very efficient, but he [has performance issues](https://stackoverflow.com/questions/68318681/c-sharp-linq-methods-time-complexity). – Tim Schmelter Jul 10 '21 at 00:35
  • 1
    `The msgs is an in memory collection (IEnumerable) said` What **specific** type is it? Please share a [mcve]. – mjwills Jul 10 '21 at 00:54
  • 1
    @TheGeneral `Any` never looks at the `Count` property. https://github.com/microsoft/referencesource/blob/master/System.Core/System/Linq/Enumerable.cs `Count()` extension method tries and looks for `ICollection.Count` and `ICollection.Count`, but not Any(). – lidqy Jul 10 '21 at 01:12
  • 1
    If you do `Console.WriteLine(msgs.GetType())`, what do you see printed in the console? – Theodor Zoulias Jul 10 '21 at 01:15
  • @lidqy when I step into the source any on a .net5 console app, i get this https://github.com/dotnet/runtime/blob/main/src/libraries/System.Linq/src/System/Linq/AnyAll.cs type of file, not sure what its different to the master branch – TheGeneral Jul 10 '21 at 03:01
  • @TheGeneral I think I go to sleep now. Your source code is a bitter more recent (.NET Core) than mine (.NET 4.7.2). So thank's for the update... – lidqy Jul 10 '21 at 03:06

2 Answers2

3

I assume that IEnumerable<BaseJournalMessage> msgs is not a collection like an array or list, otherwise the Any and Last would be no problem(but you have performance issues). So it seems to be an expensive LINQ query which gets executed twice, once at Any and again at Last.

Any needs to enumerate the sequence to see if there is at least one. Last needs to enumerate it fully to get the last one. You can make it more efficient in this way:

BaseJournalMessage last = msgs.LastOrDefault();
if (last != null)
    time = last.JournalTime;

To explain a bit more. Consider msg was an array:

IEnumerable<BaseJournalMessage> msgs = new BaseJournalMessage[0];

Here Any is simple and efficient since it just needs to check if the enumerator from the array has one element, same with other collections. The complexity is O(1).

Now consider that it's a complex query, like it seems to be in your case. Here the complexity of a following Any is clearly not O(1).:

 IEnumerable<BaseJournalMessage> msgs = hugeMessageList
    .Where(msg => ComplexMethod(msg) && OtherComplexCondition(msg))
    .OrderBy(msg => msg.SomeProperty);

This is not a collection since you don't append ToList/ToArray/ToHashSet. Instead it's a deferred executed LINQ query. You will execute it every time it will be enumerated. That could be a foreach-loop, an Any or Last call or any other method that enumerates it. Sometimes it's useful to always get the currrent state, but normally you should materialize the query to a collection if you have to access it multiple times. So append ToList and everything's fine.

Have a look at the term "deferred execution" in each LINQ method(as for example Where, Select or OrderBy) if you want to know whether it's executing a query or not. You can chain as many deferred executed methods as you want without actually executing the query. But if a method contain "forces immediate query evaluation"(like for example ToList) the query gets executed(so avoid those methods in a middle of a query).

Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • Not my downvote, but e.g. `IEnumerable msgs = new List();` would be in-memory. I don't see a reason to assume the enumerable is backed by an `IQueryable()`. – Eric J. Jul 10 '21 at 00:28
  • 2
    @EricJ.: I haven't said it's `IQueryable`. I have also edited it since i forgot to select the `JournalTime` property in my first version, maybe that was the reason for the downvote. If i say query it has nothing to do with a database call. I just meant to say that its an expensive Linq query. I have explained it in the answer – Tim Schmelter Jul 10 '21 at 00:29
  • So LastOrDefault will eliminate the performance issue? But it seems like LastOrDefault is also O(N), is that correct? Also I just updated some code comments in the question: the msgs in my questions is an IEnumerable, but there's a method call that returns List that assigns to the msgs. – DarkChocoBar Jul 12 '21 at 16:16
  • @WeilunYuan yes, it's correct that LastOrDefault is also O(n) in case source is not a collection, otherwise the method tries to cast it to IList and if that succeeds just uses the indexer, then it's O(1). But it's still more efficuent as to use Any AND Last which is 2 * O(n) – Tim Schmelter Jul 12 '21 at 16:26
  • If source(msg) is actually a List there is no Performance issue here, neither with Ay nor with Last. Then you are looking at the wrong part of the code. Maybe the method that returns the list is the problem. – Tim Schmelter Jul 12 '21 at 17:07
  • @TimSchmelter It's not quite making sense now. I specifically logged processing time with stopwatch everywhere. I believe Any() at most takes 6 seconds to process, and last() at most about 6 seconds as well (12 seconds in total). Once I replace them with LastOrDefault(), so far I only see it goes up to 6 seconds.. There's also a part that's taking some time to process later in my software, getting the journal buffer, but that is some legacy code comes with the product, so there's not much optimization that can be done there i believe. – DarkChocoBar Jul 12 '21 at 21:06
  • @WeilunYuan that's not possible if msg is a list. If it's a list both Any and Last have 0(1) complexity. Somewhere between the returned list and your above code it's modified. For example msg = msg.Where(someCondition). Cast it to List, im sure it will fail. – Tim Schmelter Jul 13 '21 at 02:04
  • @TimSchmelter But even casting it to List would take a long time to process, right? – DarkChocoBar Jul 13 '21 at 06:54
  • @TimSchmelter I think the issue is there because the return type of that method is an IEnumerable, even when inside the method, it's returning an initialized List object – DarkChocoBar Jul 13 '21 at 07:03
  • No, casting does not take time, no matter what's inside. It doesnt matter what you return, you can always cast it back to it's real type. You had to show your real code if you wanted us to help. – Tim Schmelter Jul 13 '21 at 07:43
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/234822/discussion-between-weilun-yuan-and-tim-schmelter). – DarkChocoBar Jul 13 '21 at 14:51
0

How does Any() work here? There's no condition for this Any() method call, isn't it just O(1) instead? Or it still looks through each element?

As for LinQ-To-Object, implemented in System.Linq.Enumerable static class, the implementation of Any() just gets the IEnumerator and invokes MoveNext(). If the result is true, Any() returns true itself. Otherwise it returns false. It never iterates any further.

So it is a pure O(1) algorithm. EDIT: I have to correct myself: The time depends on the enumerable "Any" iterates. I had a misconception of the Big O notation and the meaning of "O(1)" and "O(n)".

This is the source code (source available at GitHub these days):

    public static bool Any<TSource>(this IEnumerable<TSource> source) {
        if (source == null) throw Error.ArgumentNull("source");
        using (IEnumerator<TSource> e = source.GetEnumerator()) {
            if (e.MoveNext()) return true;
        }
        return false;
    }
Dharman
  • 30,962
  • 25
  • 85
  • 135
lidqy
  • 1,891
  • 1
  • 9
  • 11
  • 2
    `So it is a pure O(1) algorithm.` is overly simplistic since it isn't a fixed cost since it varies based on the ienumerable it is called on. If it is a result of a projection that is quite complex, it may be quite expensive. – mjwills Jul 10 '21 at 01:25
  • 2
    Yes, that e.MoveNext could take a day to execute even with Linq-to-objects. You cant say it's O(1) since you dont know anything about the sourxe. – Tim Schmelter Jul 10 '21 at 02:08
  • 1
    @TimSchmelter sure I can see it's O(1). because it obviously is. The source is at GitHub. If the 1 iteration in O(1) takes a day it's still O(1). – lidqy Jul 10 '21 at 02:13
  • @mjwiolls. `Any()`(in the Linq-2-object implementation) has an algorithm of O(1). there is no need to argue about. It may still seem long to execute because it the enumerable it iterates may contain loads of Iterators (Where* Grouping* Orderered* ..) that have not yet been executed and whose execution is triggered by Any(). But these algorithms are not the algorithms of any() but of the the other Linq methods that constructed these iterators. – lidqy Jul 10 '21 at 02:18
  • 2
    @lidqy That may be _technically_ accurate, which is my point. But what people are interested in when discussing Big O is the expected cost of an operation. When you are saying it is O(1), there is a danger it will be interpreted as something that isn't true. It would be simpler and clearer to say "the cost of `Any` is whatever the cost is of accessing the first element". Clean, clear, impossible to misunderstand. – mjwills Jul 10 '21 at 02:20
  • @mjwells It didn't say it's fast. I said it's O(1). and O(1) means it just takes one operation on the collection. And that what Sytem.Linq.Enumerable.Any() *always* does under all circumstances. The delay that may occur when calling Any is due to call of GetEnumerator. This kind of delay also occurs in foreach, ToArray, ToList etc and is due to the deffred execution that will then take place. – lidqy Jul 10 '21 at 02:23
  • 2
    @lidqy no, O(1) doesnt mean it's taking one Operation. It means that it takes a constant time no matter what is the source. This is not the case. The source could be a complex Linq-to-objects query which is clearly not O(1) and the Any needs to execute it. The correct answer is that you cant say anything about the complexity of Any without knowing the source. The same applies to other methods like First – Tim Schmelter Jul 10 '21 at 02:33
  • @TimSchmelter okay I agree I thought "O(1)" means there is just one operation on the collection and "O(n)" means "as many operations as items". Of course an implementation that deals with any kind of enumerable implementation can't take a constant time. Especially in Linq where all internal enumerables are built on yield return. So thanks for the clarification. – lidqy Jul 10 '21 at 02:39
  • 1
    You can say that Any or First that take a predicate have an O(n) complexity but you cant say that Any or First without predicate have O(1). That's just true for collections. – Tim Schmelter Jul 10 '21 at 02:41
  • @TimSchmelter I still wonder why you can say e..g a collection has O(1) complexity for e.g. "Contains(T item)". If you take a HashSet the actual time spent depends on the hash code of T and if it is bad , you have multiple collisions and you will have to scan the buckets and it will takes longer as compared to neat hash code where each bucket has one item. – lidqy Jul 10 '21 at 02:46
  • @lidqy [Hash table](https://en.wikipedia.org/wiki/Hash_table) has *average* lookup time O(1) which is different from [*worst case* complexity](https://en.wikipedia.org/wiki/Time_complexity) of getting element by index from an array being O(1). (Note that I don't see anyone claiming Contains for collection is O(1) in this comment thread...) – Alexei Levenkov Jul 10 '21 at 02:58