18

I am doing some comparisons about where to filter out items from a list. I am unsure of doing it directly which would be O(n), or using .Where(). I made a simple example to test .Where() on a simple data set. There are n=100 items, and when I run the debugger on the line in the function BigO() it goes exactly 100 times making me think that .Where() is also O(n). What I couldn't figure out was where the data was being stored during the operation and I wasn't sure if that was adding any increased complexity.

Am I missing something, or is .Where() O(n)?

public class ListerFactory
{

 public class Lister
 {
  bool includeItem { get; set; }
 }

 List<Lister> someList { get; set; }

 public ListerFactory()
 {
  someList = new List<Lister>();
  BuildLister();
 }    

 public void BuildLister()
 {
  for(int i = 0; i < 100; i++)
  {
   var inc = new Lister();
   inc.includeItem = i % 2;
   someList.Add(inc);
  }
  BigO();
 }

 public void BigO()
 {
  someList = someList.Where(thisList => thisList.includeItem == true).ToList();
 }
}
Travis J
  • 81,153
  • 41
  • 202
  • 273
  • LINQ to _what_? Not that it matters... – SLaks Mar 25 '12 at 22:51
  • 3
    check out John Skeets edulinq, a lot of things will quickly become obvious about how things are working. In fact, you'll quickly realise how simple the system actually is. https://msmvps.com/blogs/jon_skeet/archive/tags/Edulinq/default.aspx – Keith Nicholas Mar 25 '12 at 22:54
  • @SLaks - LINQ to objects. It tends to be easier to read than writing out whole foreach loops. – Travis J Mar 25 '12 at 22:55
  • What else did you expect? Did you expect less or more? (Also, you loop only runs 99 times, not 100.) – usr Mar 25 '12 at 22:55
  • @KeithNicholas - Thanks for the link! John is a very good source :) Part 2 - Where http://edulinq.googlecode.com/hg/posts/02-Where.html was very helpful. – Travis J Mar 25 '12 at 22:59

2 Answers2

35

Where() is O(1); it doesn't actually do any work.

Looping through the collection returned by Where() is O(n). ..

The O(n) that you're seeing is the result of ToList(), which is O(n).
If you pass a Where() query to an O(n2) algorithm, you will see the callback execute n2 times. (assuming the algorithm doesn't cache anywhere)

This is called deferred execution.

This is true about most if not all LINQ providers; it wouldn't make sense for a LINQ provider to eagerly execute all calls.


In the case of LINQ to objects, this assumes that the source collection's enumerator is O(n).
If you're using some strange collection which iterates in worse than O(n) (in other words, if its MoveNext() is worse than O(1)), Where() will be bounded by that.

To be more precise, the time complexity of enumerating a Where() query is the same as the time complexity of the original enumeration.

Similarly, I'm assuming that the callback is O(1).
If it isn't, you'll need to multiply the complexity of the callback by the complexity of the original enumeration.

SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964
  • 2
    [Jon Skeet question about linq](http://stackoverflow.com/questions/215548/whats-the-hardest-or-most-misunderstood-aspect-of-linq). suits here... – gdoron Mar 25 '12 at 22:53
  • @SLaks - I changed `someList = someList.Where(thisList => thisList.includeItem == true).ToList();` to `var s = someList.Where(thisList => thisList.includeItem == true);` at which point when checked with a debugger s had been set up as an iterator. I now understand why `.Where()` is O(1), thank you. – Travis J Mar 25 '12 at 23:17
  • 2
    @TravisJ Exactly. I'd like to echo the recommendation that you read Jon Skeet's EduLINQ. You don't need to read all of it (it's very long), but you should read at least a couple of posts to help understand how this all works. – SLaks Mar 25 '12 at 23:18
0

Depends on the source of the collection of course.

I disagree with @SLaks that the algorithm is O(1) because a query to Where() will keep searching for a candidate that matches the condition. In that sense it would be worst case O(n) with n the amount of work to yield the entire collection before the Where clause.

However he has a point it depends on the algorithm that yields the collection (for instance if it is a list that is already been build yielding the list is O(n) with n the number of items in the collection. Furthermore the algorithm that looks if there is a match is not necessarily O(1). If the yield algorithm is O(n) and the match algorithm O(m) the time complexity is O(n*m).

Take for instance a collection of integers:

int[] test = new int[] {1,2,3,4,5,6,7,8,9,10,7,5,0,1,5,6};

if you want to return all the integers who occur at least two times you could do this with a Where() clause:

test.Where(x => test.Count(y => x == y) >= 2);

The algorithm would be in O(n^2)

Secondly you can also build up the collection with a lazy setting:

public IEnumerable<int> GenerateCollection () {
    //some very complex calculation, here replaced by a simple for loop
    for(int i = 0; i < 150; i++) {
        yield return i;
    }
}

Your algorithm however first generates the list. So the timecomplexity is O(n).

Notice however if you iterate the entire collection after the where the timecomplexity is still O(n*m) and not O(n*n*m). That's because once a candidate has been matched, it will not be reconsidered.

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
  • 3
    You are completely wrong. A lone `Where()` call will _never_ search anything. – SLaks Mar 25 '12 at 23:02
  • Also, that would have been _all_ cases, not the worst case. You're confusing `Where()` with `FirstOrDefault()`. – SLaks Mar 25 '12 at 23:03
  • If you query Where for one element, it will not return the first candidate that fits? What does it return in that scenario? Lets say I have `new int[] {1,2,3,4,5}.Where(x => x>3)`. One can of course argue that it will return nothing, only an iterator where the query is postponed. But what if I add `Take(1)`. Thats basically a single query, or do I make a mistake? – Willem Van Onsem Mar 25 '12 at 23:05
  • That code is still O(1). It doesn't do anything until you enumerate the result. – SLaks Mar 25 '12 at 23:07
  • 1
    Enumerating the result is _always_ O(n). – SLaks Mar 25 '12 at 23:07
  • Your third paragraph is also completely wrong. I edited my answer to explain that. – SLaks Mar 25 '12 at 23:07
  • Point taken with the last one (The comment about `FirstOrDefault`). However its not quite clear if the question asks for First() or FirstOrDefault(). I agree `Where()` is in `O(1)` because of its lazyness. However one doesn't buy anything with that. I'm not convinced that my third paragraph is false. What if you want to query for instance all the items that occur at least three times with a where clause? – Willem Van Onsem Mar 25 '12 at 23:11
  • The question isn't talking about any of that. You're right that it also depends on the time complexity of the callback. – SLaks Mar 25 '12 at 23:13
  • 1
    I was just giving a general formula. I'm more a theoretical computer scientist, no offence :D. However I can't still see what wrong from a theoretical view. `FirstOrDefault` will indeed give a result in `O(m)` with `m` the callback. In most cases however these statements are used to migrate the data of one collection to another, so with an exhaustive iteration. (don't know if this is a valid english sentence). – Willem Van Onsem Mar 25 '12 at 23:19