Linq Caveats

Question

Linq is an awesome addition to .NET and I've found it has served me well in many situations even though I'm only beginning to learn about how to use Linq.

However, in the reading I've been doing about Linq, I've discovered that there are some subtle things a developer needs to keep an eye out for that can lead to trouble.

I've included one definite caveat that I've come across that is a result of deferred execution.

So I'm wondering, what other caveats exist for Linq that developers new to Linq should know about?

score 5 · Accepted Answer · answered Mar 17 '09 at 01:33

5

Building up a query within a foreach loop

IEnumerable<char> query = "Not what you might expect";
foreach(char vowel in "aeiou")
{
    query = query.Where(c => c != vowel);
}

The above code only removes the "u" from the string because of deferred execution.

In order to remove all the vowels you need to do the following:

IEnumerable<char> query = "Not what you might expect";
foreach(char vowel in "aeiou")
{
    char temp = vowel;
    query = query.Where(c => c != temp);
}

answered Mar 17 '09 at 01:33

mezoid

28,090
37
107
148

1

hmmm...this question wasn't as popular as I had hoped it would be. I guess since my answer got the most votes I'll have to accept it as the answer....if that ever changes I'll consider accepting a different answer.. – mezoid Mar 26 '09 at 23:53
this isn't a Linq caveat, it's just general **lambda** caveat, a kind of broken implementation specific only to pre-C# 5 era. In C# 5 this works as expected. – nawfal Nov 29 '13 at 08:54

score 3 · Answer 2 · answered Mar 17 '09 at 02:05

I think LINQ is fairly solid, and there aren't a lot of big caveats. Nearly every "problem" I've run into is the result of deferred execution, and it's not really a problem, but rather a different way of thinking.

The biggest issue I've faced - LINQ is a game changer (or at least a rule bender) when it comes to profiling for performance. The deferred execution can make it much more difficult to profile an application at times, and can also dramatically change the runtime performance characteristics in unexpected ways. Certain LINQ operations seem almost magical with how fast they are, and others take a lot longer than I expected - but it's not always obvious from the code or profiler results.

That being said, in general, the deferred execution more than makes up for the cases where it's slowed down hand-coded routines. I much prefer the simpler, cleaner code to the code it replaced.

Also, I have found that the more I use LINQ to Objects, the more I have to rethink my design and rework my collections in general.

For example, I had never realized how often I was exposing IList instead of IEnumerable when it wasn't absolutely necessary until I started using linq to objects frequently. I now completely understand why MS design guidelines warn against using IList too often (for example, don't return IList just for the Count property, etc). When I'd have methods that took IList, passing through the IEnumerable results from a linq query requires .ToList() or a reworking of the method's API.

But it's almost always worth the rethinking - I've found many places where passing an enumerable and using LINQ resulted in a huge perf. gains. The deferred execution is wonderful if you think about it, and take full advantage of it. For example, using .Take() to restrict a collection to the first 2 elements if that's all that's needed was a bit more challenging pre-linq, and has dramatically sped up some of my nastier loops.

score 1 · Answer 3 · edited May 23 '17 at 11:56

Good question. As Reed points out they all mostly stem from deferred execution (but unlike he I find it a drawback. Just thinking why cant deferred executions be carried out by memorizing the state). Here are a couple of examples - all are more or less variants of deferred execution problem.

1) I'm too lazy to do something on time

Linq is executed only on demand.

A common mistake newbies (myself in the past included) make is not knowing about deferred execution. For eg, something like

 var p = listOfAMillionComplexItems.OrderBy(x => x.ComplexProperty);

runs in a jiffy, but the actual sorting is not completed until you enumerate the list, in other words, the execution is not completed until you need the result of the execution. To get it executed, you need something like:

foreach(var item in p)...
//or
p.Count();
//or
p.ToList();
//etc

See them as SQL queries. If you have

var query = from i in otherValues where i > 5 select i;

think its akin to writing

string query = "SELECT i FROM otherValues WHERE i > 5";

Does the latter run a call to db? No. You have to

Execute(query);

Its the same thing here as well with Linq.

2) I live in the present

Be cautious about variables inside Linq expressions getting changed later on.

To be safe, backup variables first and then use the backup in query if the variable can change later on before the actual execution of query.

From here:

decimal minimumBalance = 500;
var customersOver500 = from c in customers 
                       where c.Balance > minimumBalance 
                       select c;

minimumBalance = 200;
var customersOver200 = from c in customers
                       where c.Balance > minimumBalance 
                       select c;

int count1 = customersOver500.Count();
int count2 = customersOver200.Count();

Suppose we have four customers with the following balances: 100, 300, 400 and 600. What will count1 and count2 be? They'll both be 3. The "customersOver500" references the "minimumBalance" variable, but the value isn't obtained until the query results are iterated over (through a for/each loop, a ToList() call or even a "Count()" call as shown above). At the time the value is used to process the query, the value for minimumBalance has already changed to 200, so both LINQ queries produce identical results (customers with a balance over 200).

3) My memory is too weak to remember the valuables of the past

The same as above, the context being a little different.

or this from the same site:

Consider this simple example of a method using LINQ-to-SQL to get a list of customers:

public IEnumerable<Customer> GetCustomers()
{
    using(var context = new DBContext())
    {
        return from c in context.Customers
               where c.Balance > 2000
               select c;
    }
}

Seems pretty harmless -- until you get an "ObjectDisposedException" when you try and enumerate the collection. Why? Because LINQ doesn't actually perform the query until you try and enumerate the results. The DBContext class (which exposes the Customers collection) is disposed of when this call exits. Once you try and enumerate through the collection, the DBContext.Customers class is referenced and you get the exception.

4) Don't try to catch me, I might still slip away

Try-catch is pointless for a statement if not wisely used.

Instead global exception handling will be better here.

try
{
    wallet = bank.Select(c => Convert.ToInt32(""));
}
catch (Exception ex)
{
    MessageBox.Show("Cannot convert bad int");
    return;
}

foreach(int i in wallet)
  //kaboom!

Neither we get the correct error message nor the function is quit by return.

5) I'm not only unpunctual, but I don't learn from mistakes as well

Linq is executed each time you enumerate over them. So do not reuse Linq enumerables.

Suppose you have an IQueryable or IEnumerable returned from a Linq expression. Now enumerating the collection will get the statement executed, but only once? No, every time you do. This had bitten me in the past. If you have:

var p = listOfAMillionComplexItems.OrderBy(x => x.ComplexProperty);
MessageBox.Show(p.Count().ToString()); //long process.
MessageBox.Show(p.Count().ToString()); //long process still.

So better do

int i = p.Count(); //store in a variable to access count
//or better
var list = p.ToList(); //and start using list

6) If you don't know to use me, I can cause side effects!

The same as above, just to show how reusing Linq enumerables can cause undesired behaviour.

Ensure you don't do side-effect programming (since re-enumerating in Linq is much more common) To give a wild example,

p = bag.Select((t, i) => {if(i == 1) MessageBox.Show("Started off"); return t;});

If you enumerate twice you know what undesired thing can happen.

7) Be wary of order I am executed when chaining

Not just for variables, even the chained Linq functions can be executed in different order from what you normally expect (though behaviour is correct). Don't think imperative (step by step), think how Linq can possibly execute it.

Eg,

var d = Enumerable.Range(1, 100);
var f = d.Select(t => new Person());
f = f.Concat(f);
f.Distinct().Count() ??

What will be the count of distinct people in f? I would guess 100, no but it is 200. The problem is that when the actual execution of the logic of concatenation takes place, f is still d.Select(t => new Person() unexecuted. So this effectively yields in

f = d.Select(t => new Person()).Concat(d.Select(t => new Person()));

which then has 200 distinct members. Here's a link for the actual problem

8) Hey, actually we're smarter than you think.

Not a caveat per se, but there are many cases where Linq can outperform your imperative style program. So before optimizing, give a second thought, and even benchmark.

The reason that deferred execution is basically executed on demand makes Linq much more efficient than it appears. The iterator block "yields" one item at a time, as demanded, lending the ability to stop execution when its no more needed. Here is a very good question that details just that: Order of LINQ extension methods does not affect performance?

9) I'm not meant to crunch number

Abuse of Linq can make code inefficient as well as less readable.

For number crunching algorithms, Linq is not the right tool, especially for large data sets whose complexity can scale exponentially. Sometimes just two for loops would suffice better. The same can apply for raw SQL when compared to LINQ to SQL.

10) Hire me for the right job

Asking Linq to mind your normal business is bad programming choice, something that goes against readability.

Some eg:

medicines.Any(p =>
{
    Console.WriteLine(p);
    return false;
});

for a foreach on an enumerable.

or

medicines = medicines.Select(p =>
{
    p.Id = 3;
    return p;
});

Just bad tools.

11) Debugging and Profiling can be a nightmare

Its hard to follow what's happening under hood a Linq expression from VS

Not that its entirely impossible, but its bit of a task to debug a linq query as efficiently as non linq code from VS itself. Profiling also becomes a tad harder because of the nature of deferred execution. But it shouldn't stop anyone from doing the trivial one or two liners!

A bunch of caveats all related to deferred execution more or less! A ditto question here. Some related reading on SO:

Examples on when not to use LINQ

Pros and Cons of LINQ (Language-Integrated Query)

What is the biggest mistake people make when starting to use LINQ?

drawbacks of linq

Linq Caveats

3 Answers3

Linked