Good question. As Reed points out they all mostly stem from deferred execution (but unlike he I find it a drawback. Just thinking why cant deferred executions be carried out by memorizing the state). Here are a couple of examples - all are more or less variants of deferred execution problem.
1) I'm too lazy to do something on time
Linq is executed only on demand.
A common mistake newbies (myself in the past included) make is not knowing about deferred execution. For eg, something like
var p = listOfAMillionComplexItems.OrderBy(x => x.ComplexProperty);
runs in a jiffy, but the actual sorting is not completed until you enumerate the list, in other words, the execution is not completed until you need the result of the execution. To get it executed, you need something like:
foreach(var item in p)...
//or
p.Count();
//or
p.ToList();
//etc
See them as SQL queries. If you have
var query = from i in otherValues where i > 5 select i;
think its akin to writing
string query = "SELECT i FROM otherValues WHERE i > 5";
Does the latter run a call to db? No. You have to
Execute(query);
Its the same thing here as well with Linq.
2) I live in the present
Be cautious about variables inside Linq expressions getting changed
later on.
To be safe, backup variables first and then use the backup in query if the variable can change later on before the actual execution of query.
From here:
decimal minimumBalance = 500;
var customersOver500 = from c in customers
where c.Balance > minimumBalance
select c;
minimumBalance = 200;
var customersOver200 = from c in customers
where c.Balance > minimumBalance
select c;
int count1 = customersOver500.Count();
int count2 = customersOver200.Count();
Suppose we have four customers with the following balances: 100, 300, 400 and 600. What will count1 and count2 be? They'll both be 3. The "customersOver500" references the "minimumBalance" variable, but the value isn't obtained until the query results are iterated over (through a for/each loop, a ToList() call or even a "Count()" call as shown above). At the time the value is used to process the query, the value for minimumBalance has already changed to 200, so both LINQ queries produce identical results (customers with a balance over 200).
3) My memory is too weak to remember the valuables of the past
The same as above, the context being a little different.
or this from the same site:
Consider this simple example of a method using LINQ-to-SQL to get a list of customers:
public IEnumerable<Customer> GetCustomers()
{
using(var context = new DBContext())
{
return from c in context.Customers
where c.Balance > 2000
select c;
}
}
Seems pretty harmless -- until you get an "ObjectDisposedException" when you try and enumerate the collection. Why? Because LINQ doesn't actually perform the query until you try and enumerate the results. The DBContext class (which exposes the Customers collection) is disposed of when this call exits. Once you try and enumerate through the collection, the DBContext.Customers class is referenced and you get the exception.
4) Don't try to catch me, I might still slip away
Try-catch is pointless for a statement if not wisely used.
Instead global exception handling will be better here.
try
{
wallet = bank.Select(c => Convert.ToInt32(""));
}
catch (Exception ex)
{
MessageBox.Show("Cannot convert bad int");
return;
}
foreach(int i in wallet)
//kaboom!
Neither we get the correct error message nor the function is quit by return
.
5) I'm not only unpunctual, but I don't learn from mistakes as well
Linq is executed each time you enumerate over them. So do not reuse Linq enumerables.
Suppose you have an IQueryable
or IEnumerable
returned from a Linq expression. Now enumerating the collection will get the statement executed, but only once? No, every time you do. This had bitten me in the past. If you have:
var p = listOfAMillionComplexItems.OrderBy(x => x.ComplexProperty);
MessageBox.Show(p.Count().ToString()); //long process.
MessageBox.Show(p.Count().ToString()); //long process still.
So better do
int i = p.Count(); //store in a variable to access count
//or better
var list = p.ToList(); //and start using list
6) If you don't know to use me, I can cause side effects!
The same as above, just to show how reusing Linq enumerables can cause undesired behaviour.
Ensure you don't do side-effect programming (since re-enumerating in Linq is much more common) To give a wild example,
p = bag.Select((t, i) => {if(i == 1) MessageBox.Show("Started off"); return t;});
If you enumerate twice you know what undesired thing can happen.
7) Be wary of order I am executed when chaining
Not just for variables, even the chained Linq functions can be executed in different order from what you normally expect (though behaviour is correct). Don't think imperative (step by step), think how Linq can possibly execute it.
Eg,
var d = Enumerable.Range(1, 100);
var f = d.Select(t => new Person());
f = f.Concat(f);
f.Distinct().Count() ??
What will be the count of distinct people in f
? I would guess 100, no but it is 200. The problem is that when the actual execution of the logic of concatenation takes place, f
is still d.Select(t => new Person()
unexecuted. So this effectively yields in
f = d.Select(t => new Person()).Concat(d.Select(t => new Person()));
which then has 200 distinct members. Here's a link for the actual problem
8) Hey, actually we're smarter than you think.
Not a caveat per se, but there are many cases where Linq can outperform your imperative style program. So before optimizing, give a second thought, and even benchmark.
The reason that deferred execution is basically executed on demand makes Linq much more efficient than it appears. The iterator block "yields" one item at a time, as demanded, lending the ability to stop execution when its no more needed. Here is a very good question that details just that: Order of LINQ extension methods does not affect performance?
9) I'm not meant to crunch number
Abuse of Linq can make code inefficient as well as less readable.
For number crunching algorithms, Linq is not the right tool, especially for large data sets whose complexity can scale exponentially. Sometimes just two for loops would suffice better. The same can apply for raw SQL when compared to LINQ to SQL.
10) Hire me for the right job
Asking Linq to mind your normal business is bad programming choice, something that goes against readability.
Some eg:
medicines.Any(p =>
{
Console.WriteLine(p);
return false;
});
for a foreach on an enumerable.
or
medicines = medicines.Select(p =>
{
p.Id = 3;
return p;
});
Just bad tools.
11) Debugging and Profiling can be a nightmare
Its hard to follow what's happening under hood a Linq expression from VS
Not that its entirely impossible, but its bit of a task to debug a linq query as efficiently as non linq code from VS itself. Profiling also becomes a tad harder because of the nature of deferred execution. But it shouldn't stop anyone from doing the trivial one or two liners!
A bunch of caveats all related to deferred execution more or less! A ditto question here. Some related reading on SO:
Examples on when not to use LINQ
Pros and Cons of LINQ (Language-Integrated Query)
What is the biggest mistake people make when starting to use LINQ?
drawbacks of linq