2

Which way of null checking would be better to use and why?

var myVar = myCollection.FirstOrDefault(q => q.Id == 10);
if (myVar != null)
{
   anotherVar = myVar.MyName;
}

or:

var myVar = myCollection.Where(q => q.Id == 10);
if (myVar.Any())
{
   anotherVar = myVar.First().MyName;
}

or just no difference?

rem
  • 16,745
  • 37
  • 112
  • 180
  • your second option doesn't involve null checking? – bas Feb 07 '13 at 20:53
  • @Haxx: The second way can't be null, as it is a collection, which can't be null, only empty. – Femaref Feb 07 '13 at 20:54
  • my point exactly. so it's not about "Which way of null checking would be better to use and why?" or am I still missing the point? :) – bas Feb 07 '13 at 20:55
  • Just opinion, but the first way is more readable to me. Though it isn't like the second one is obfuscated. – Gray Feb 07 '13 at 20:56
  • @Gray: Actually it's just obfuscated due to the bad naming. The first is an object and the second is a query but both have the same name. If you would name it `var itemsWithId10 = myCollection.Where(q => q.Id == 10);`, then `if(itemsWithId10.Any())` would be clearer. – Tim Schmelter Feb 07 '13 at 22:13

6 Answers6

1

Could be premature optimization, but the first approach needs only one execution hence it's a little bit more efficient.

I'm using the second approach when i might need the query later again lazily. The FirstOrDefault "finishes" it.

Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • the first approach will not only iterate once, but it will also _stop_ as soon as it finds a match. 'where' will iterate to the end even after finding the first match. (linq to objects) – Eren Ersönmez Feb 07 '13 at 21:14
  • 2
    @ErenErsönmez Your second statement is false. `Where` uses deferred execution, so it will not iterate anything until it needs to, when it needs to. Calling `Where(predicate).First()` iterates the same number of items as `First(predicate)`. – Servy Feb 07 '13 at 21:14
  • @ErenErsönmez: Addition to Servy's comment: `Enumerable.Where` is comparable with an `if`-statement in a loop. You can break whenever you want. So `Where(...).First()` breaks after the first matching element, if you use `Where(...).Take(10)` it will execute the loop until the predicate of the `Where` returned 10 times `true`. – Tim Schmelter Feb 07 '13 at 21:33
  • @ErenErsönmez: Read following links to learn when the order of `Linq` extension methods matters: 1) [Does the order of LINQ functions matter?](http://stackoverflow.com/questions/7499384/does-the-order-of-linq-functions-matter) and (my own) 2) [Order of LINQ extension methods does not affect performance?](http://stackoverflow.com/questions/10110013/order-of-linq-extension-methods-does-not-affect-performance). – Tim Schmelter Feb 07 '13 at 21:45
1

You want one element.

So one FirstOrDefault() with a null check is clearer than

Where
Any
First

For performance, this won't change your life in most cases. But I would take first for "readability".

Raphaël Althaus
  • 59,727
  • 6
  • 96
  • 122
  • It most certainly can change your life when it comes to performance. If `myCollection` is an `IQueryable` that hits a database, iterating it twice is a *big* deal. Or if it's a list with millions of items, and the first matched item is right near the end, iterating it twice matters *a lot*. It could also be an enumerable that does a lot of computation while iterating, such as a complex Linq-to-objects query, or be a data structure that can't be iterated super quickly (i.e. LinkedList). – Servy Feb 07 '13 at 21:00
  • @Servy: I think that even a linq provider that hits the database does not iterates "millions of items" twice, `Any` is optimized and queries are cached. – Tim Schmelter Feb 07 '13 at 21:05
  • 2
    @Servy Yes of course, we can find extreme cases (I don't agree with the two hits on db as a "major" deal, but anyway). But this is optimisation. I take care of optimisation when it's a problem, readability is more often a problem (my opinion). – Raphaël Althaus Feb 07 '13 at 21:06
  • @RaphaëlAlthaus In most cases you're right, optimization is not of the utmost importance, but doing an extra DB operation is one of the few things for which even doing one extra query is likely to have a human-noticeable impact. If you optimize nothing else in your programs, make sure that you perform as few round trips to the DB as you can. – Servy Feb 07 '13 at 21:08
  • @RaphaëlAlthaus: In my opinion the first is not _clearer_ but shorter (and needs only one execution). `Where`, `Any` is meaningful and can be useful if you want to filter the query later further or use it deferred. `FirstOrDefault` finishes the query and it a little bit clumsy since you use a `null`check to determine if a query has `Any` elements. If you select a value type it's even error-prone. – Tim Schmelter Feb 07 '13 at 21:10
  • @TimSchmelter In the case of a query provider the results are only ever of size 0 or 1, but you have the round trips to the DB to deal with. `Any` won't actually get the results, it just gets a boolean indicating if there are any. I'd be very concerned if such queries actually returned the results as the whole point of `Any` is that it won't. This means `First` *must* do another round trip. The iterating millions of items was referring to a large in memory collection, not an `IQueryable`. – Servy Feb 07 '13 at 21:10
  • @TimSchmelter See my recent edit for a pattern that uses `Where` `Any` and `First`, but doesn't have any of the performance pitfalls or the bug with having a valid default value. – Servy Feb 07 '13 at 21:14
  • @TimSchmelter "clearer" is of course a question of point of view... I won't argue on this. – Raphaël Althaus Feb 07 '13 at 21:15
1

I'd prefer the first way, as it is much clearer what you intend to do.

Femaref
  • 60,705
  • 7
  • 138
  • 176
  • Imho the second approach is more readable since it uses meaningful methods. `Where` to specify the filter and `Any` to check if there's _any_ result. But the `FirstOrDefault` needs only one execution. – Tim Schmelter Feb 07 '13 at 20:58
1

The first option can be broken as a result of a null item that passes the check, thus making you think there are no matching items, even if they are. It doesn't apply to this particular example, but it could apply in the general case.

However, the second example here iterates the source sequence twice (sometimes), once to see if there are any results and then again to get that result. If the source sequence needs to perform a database query to get the results that could be very expensive. Because of this you should only use this option if you're sure that you have an in-memory collection you're dealing with and that it's not particularly large (or that the first item you need will be found quickly).

In the event that you need to worry about this particular edge case with the first option, or you want to get the benefits of using Any and First (for their superior semantic representation of what you want) with the performance benefits of FirstOrDefault you can use this pattern:

var myVar = myCollection.Where(q => q.Id == 10)
    .Take(1)
    .ToList();
if (myVar.Any())
{
   anotherVar = myVar.First().MyName;
}

You could make an extension method to shorten this if you wanted:

public static IEnumerable<T> FirstOrEmpty<T>(this IEnumerable<T> source)
{
    //TODO: null check arguments
    using (var iterator = source.GetEnumerator())
    {
        if (iterator.MoveNext())
            return new T[] { iterator.Current };
        else
            return Enumerable.Empty<T>();
    }
}

public static IEnumerable<T> FirstOrEmpty<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
    return FirstOrEmpty(source.Where(predicate));
}

This would allow you to just write:

var myVar = myCollection.FirstOrEmpty(q => q.Id == 10);
if (myVar.Any())
{
   anotherVar = myVar.First().MyName;
}
Servy
  • 202,030
  • 26
  • 332
  • 449
0

I tend to use expressions that involve Where followed by FirstOrDefault:

var myVar = myCollection.Where(x => x.Id==10).FirstOrDefault();
if (myVar != null)
{
   anotherVar = myVar.MyName;
}

As Raphael Althaus points out above, you want one variable to be null checked, in my opinion you should first make a query with the condition and then pick the first one if exists, and check that.

SilentDoc
  • 357
  • 2
  • 8
0

The two approches are different, here the ILs:

FirstOrDefault + if (myVar != null)

IL_0067:  ldsfld      UserQuery.CS$<>9__CachedAnonymousMethodDelegate1
IL_006C:  brtrue.s    IL_007F
IL_006E:  ldnull      
IL_006F:  ldftn       b__0
IL_0075:  newobj      System.Func<<>f__AnonymousType0<System.Int32,System.String>,System.Boolean>..ctor
IL_007A:  stsfld      UserQuery.CS$<>9__CachedAnonymousMethodDelegate1
IL_007F:  ldsfld      UserQuery.CS$<>9__CachedAnonymousMethodDelegate1
IL_0084:  call        System.Linq.Enumerable.FirstOrDefault  <-----
IL_0089:  stloc.2     // myVar
IL_008A:  ldloc.2     // myVar
IL_008B:  brfalse.s   IL_0094
IL_008D:  ldloc.2     // myVar
IL_008E:  callvirt    <>f__AnonymousType0<System.Int32,System.String>.get_MyName
IL_0093:  stloc.1     // anotherVar
IL_0094:  ldloc.1     // anotherVar
  • FirstOrDefault

Where + if (myVar.Any())

IL_0067:  ldsfld      UserQuery.CS$<>9__CachedAnonymousMethodDelegate1
IL_006C:  brtrue.s    IL_007F
IL_006E:  ldnull      
IL_006F:  ldftn       b__0
IL_0075:  newobj      System.Func<<>f__AnonymousType0<System.Int32,System.String>,System.Boolean>..ctor
IL_007A:  stsfld      UserQuery.CS$<>9__CachedAnonymousMethodDelegate1
IL_007F:  ldsfld      UserQuery.CS$<>9__CachedAnonymousMethodDelegate1
IL_0084:  call        System.Linq.Enumerable.Where   <-----
IL_0089:  stloc.2     // myVar
IL_008A:  ldloc.2     // myVar
IL_008B:  call        System.Linq.Enumerable.Any     <-----
IL_0090:  brfalse.s   IL_009E
IL_0092:  ldloc.2     // myVar
IL_0093:  call        System.Linq.Enumerable.First   <-----
IL_0098:  callvirt    <>f__AnonymousType0<System.Int32,System.String>.get_MyName
IL_009D:  stloc.1     // anotherVar
IL_009E:  ldloc.1     // anotherVar
  • Where
  • Any
  • First

It looks like micro-optimization but the first one should be faster because the single enumeration at FirstOrDefault but if the enumerator after the Where contains not many element with q.Id == 10 it doesn't really matter. Between the two I would definitely prefer the most clear syntax.

BTW I a mot a big fan of null... what about usign if (myVar != default(T)) instead?

Filippo Vitale
  • 7,597
  • 3
  • 58
  • 64