6

There are a few questions similar to this which deals with right input and output types like this. My question is what good practices, method naming, choosing parameter type, or similar can safeguard from deferred execution accidents?

These are most prevalent with IEnumerable which is a very common argument type because:

  • Follows the robustness principle "Be conservative in what you do, be liberal in what you accept from others"

  • Used extensively with Linq

  • IEnumerable is high in the collection hierarchy and predates newer collection types

However, it also introduces deferred execution. Now we might have gone wrong in designing our methods (especially extension methods) when we thought the best idea is to take the most basic type. So our methods looked like:

public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> lstObject)
{
    foreach (T t in lstObject)
       //some fisher-yates may be
}

The danger obviously is when we mix the above function with lazy Linq and its so susceptible.

var query = foos.Select(p => p).Where(p => p).OrderBy(p => p); //doesn't execute
//but
var query = foos.Select(p => p).Where(p => p).Shuffle().OrderBy(p => p);
//the second line executes up to a point.

A bigger edit:

Reopening this: a criticism of a language's functionality isn't constructive - however asking for good practices is where StackOverflow shines. Updated the question to reflect this.

A big edit here :

To clarify the above line - My question is not about the second expression not getting evaluated, seriously not. Programmers know it. My worry is about Shuffle method actually executing the query up to that point. See the first query, where nothing gets executed. Now similarly when constructing another Linq expression (which should be executed later), our custom function is playing the spoilsport. In other words, how to let the caller know Shuffle is not the kinda function they would want at that point of Linq expression. I hope the point is driven home. Apologies! :) Though its as simple as going and inspecting the method, I'm asking how do you guys typically program defensively..

The above example may not be that dangerous, but you get the point. That is certain (custom) functions don't go well with the Linq idea of deferred execution. The problem is not just about performance, but also about unexpected side-effects.

But a function like this works magic with Linq:

public static IEnumerable<S> DistinctBy<S, T>(this IEnumerable<S> source, 
                                              Func<S, T> keySelector)
{
    HashSet<T> seenKeys = new HashSet<T>(); //credits Jon Skeet
    foreach (var element in source)
        if (seenKeys.Add(keySelector(element)))
            yield return element;
}

As you can see both the functions take IEnumerable<>, but the caller wouldn't know how the functions react. So what are the general cautionary measures that you guys take here?

  1. Name our custom methods appropriately so that it gives the idea for the caller that it does bode well or not with Linq?

  2. Move lazy methods to a different namespace, and keep Linq-ish to another, so that it gives some sort of an idea at least?

  3. Do not accept an IEnumerable as parameter for immediately executing methods but instead take a more derived type or a concrete type itself which thus leaves IEnumerable for lazy methods alone? This puts the burden on the caller to do the execution of possible un-executed expressions? This is quite possible for us, since outside Linq world we hardly deal with IEnumerables, and most basic collection classes implement up to ICollection at least.

Or anything else? I particularly like the 3rd option, and that's what I was going with, but thought to get your ideas prior to. I have seen plenty of code (nice little Linq like extension methods!) from even good programmers that accept IEnumerable and do a ToList() or something similar on them inside the method. I don't know how they cope with the side-effects..

Edit: After a downvote and an answer, I would like to clarify that its not about programmers not knowing about how Linq works (our proficiency could be at some level, but thats a different thing), but its that many functions were written not taking Linq into account back then. Now chaining an immediately executing method along with Linq extension methods make it dangerous. So my question is there a general guideline programmers follow to let the caller know what to use from Linq side and what not to? It's more about programming defensively than if-you-don't-know-to-use-it-then-we-can't-help! (or at least I believe)..

KCD
  • 9,873
  • 5
  • 66
  • 75
nawfal
  • 70,104
  • 56
  • 326
  • 368
  • If this is cw, please someone make it so, I cant find an option – nawfal Nov 25 '12 at 23:46
  • 2
    Deferred execution has been possible since .Net 1.0, and easy since C# 2.0. LINQ just makes it more prevalent. – SLaks Nov 25 '12 at 23:49
  • 3
    @nawful: Huh? LINQ is not magic; it's just a bunch of fairly simple extension methods. I strongly recommend that you read Jon Skeet's [EduLINQ series](http://msmvps.com/blogs/jon_skeet/archive/tags/Edulinq/default.aspx) to understand how it works. – SLaks Nov 25 '12 at 23:51
  • @SLaks Ok i will read it, but does it have anything related to deferred execution from pre Linq days? I couldnt see it in the link you posted in one glance – nawfal Nov 25 '12 at 23:59
  • 2
    Deferred execution just means an enumerator that does work in `MoveNext()`. It has always been possible to write that by hand, and the C# compiler does it for you with the `yield` keyword. – SLaks Nov 26 '12 at 00:03
  • Regarding #3, you should accept a parameter that you need, if you only need to iterate a list, use IEnumerable, if you need the methods of ICollection, use that. Using a specific type of parameter as explanation of what a function does, instead of having a specific method name, seems weird.. – Patrick Nov 26 '12 at 00:06
  • @SLaks Ok I get it. Wasn't well versed with `yield`. Hmm, so I believe this question is still valid, considering Linq made it so common. Just trying to see how people write custom methods that will have to deal with Linq – nawfal Nov 26 '12 at 00:07
  • The real question is, should you do regression testing when moving to a new version of the language. The answer is yes. :-) Seriously, in those few cases where you need the entire enumeration iterated, you can call `ToList()` or `ToArray()` to force that behavior--its not perfect, I will admit. – SAJ14SAJ Nov 26 '12 at 00:19
  • 3
    That's a lot of text for asking "Do you use any markup on methods that use deferred execution". :) Either way it's a bit unconstructive since many people might do different things, and therefore this question doesn't have any "correct answer". – Patrick Nov 26 '12 at 00:33
  • 1
    There is no single answer but StackOverflow has many examples of questions that have multiple good answers and these questions are often the most valuable to the community. Please reopen this question – KCD Oct 10 '19 at 01:14

3 Answers3

7

As you can see both the functions take IEnumerable<>, but the caller wouldn't know how the functions react.

That's simply a matter of documentation. Look at the documentation for DistinctBy in MoreLINQ, which includes:

This operator uses deferred execution and streams the results, although a set of already-seen keys is retained. If a key is seen multiple times, only the first element with that key is returned.

Yes, it's important to know what a member does before you use it, and for things accepting/returning any kind of collection, there are various important things to know:

  • Will the collection be read immediately, or deferred?
  • Will the collection be streamed while results are returned?
  • If the declared collection type accepted is mutable, will the method try to mutate it?
  • If the declared collection type returned is mutable, will it actually be a mutable implementation?
  • Will the collection returned be changed by other actions (e.g. is it a read-only view on a collection which may be modified within the class)
  • Is null an acceptable input value?
  • Is null an acceptable element value?
  • Will the method ever return null?

All of these things are worth considering - and most of them were worth considering long before LINQ.

The moral is really, "Make sure you know how something behaves before you call it." That was true before LINQ, and LINQ hasn't changed it. It's just introduced two possibilities (deferred execution and streaming results) which were rarely present before.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Ok I get the point. Thanks. Can you provide me a good link that gives a clear explanation on difference between "reading deferred" and "returning streamed". Aren't they the consequence of one another? – nawfal Nov 26 '12 at 07:01
  • 1
    @nawfal: Well you can't stream without using deferred execution, but some operators are deferred without streaming: as soon as you ask for the first element, they read the *whole* of the input before yielding any output. `OrderBy` and `Reverse` are good examples of this. See http://msmvps.com/blogs/jon_skeet/archive/2010/03/25/just-how-lazy-are-you.aspx for some more of my thoughts on this. – Jon Skeet Nov 26 '12 at 07:06
  • Ok Jon accepted. But I must say not knowing of null being acceptable as input or method mutating the passed collection type etc are not as serious as not knowing if method is executed immediately or deferred, hence I made this question. Thanks btw! :) – nawfal Nov 26 '12 at 21:34
1

Use IEnumerable wherever it makes sense, and code defensively.

As SLaks pointed out in a comment, deferred execution has been possible with IEnumerable since the beginning, and since C# 2.0 introduced the yield statement, it's been very easy to implement deferred execution yourself. For example, this method returns an IEnumerable that uses deferred execution to return some random numbers:

public static IEnumerable<int> RandomSequence(int length)
{
    Random rng = new Random();
    for (int i = 0; i < length; i++) {
        Console.WriteLine("deferred execution!");
        yield return rng.Next();
    }
}

So whenever you use foreach to loop over an IEnumerable, you have to assume that anything could happen in between iterations. It could even throw an exception, so you may want to put the foreach loop inside a try/finally.

If the caller passes in an IEnumerable that does something dangerous or never stops returning numbers (an infinite sequence), it's not your fault. You don't have to detect it and throw an error; just add enough exception handlers so that your method can clean up after itself in the event something goes wrong. In the case of something simple like Shuffle, there's nothing to do; just let the caller deal with the exception.

In the rare case that your method really can't deal with an infinite sequence, consider accepting a different type like IList. But even IList won't protect you from deferred execution - you don't know what class is implementing IList or what sort of voodoo it's doing to come up with each element! In the super-rare case that you really can't allow any unexpected code to run while you iterate, you should be accepting an array, not any kind of interface.

Tara McGrew
  • 1,987
  • 19
  • 28
0

Deferred execution has nothing to do with types. Any linq method that uses iterators has potential for deferred execution if you write your code that way. Select(), Where(), OrderByDescending() for e.g. all use iterators and hence defer execution. Yes those methods expect an IEnumerable<T>, but that doesn't mean that IEnumerable<T> is the problem.

That is certain (custom) functions don't go well with the Linq idea of deferred execution. The problem is not just about performance, but also about unexpected side-effects.

So what are the general cautionary measures that you guys take here?

None. Honestly we use IEnumerable everywhere and don't have the problem of people not understanding "side effects". "the Linq idea of deferred execution" is central to its usefulness in things like Linq-to-SQL. It sounds to me like the design of the custom functions is not as clear as it could be. If people are writing code to use LINQ and they don't understand what it's doing, then that is the issue, not the fact that IEnumerable happens to be a base type.

All of your ideas are just wrappers around the fact that it sounds like you have programmers that just don't understand linq queries. If you don't need lazy execution, which it sounds like you don't, then just force everything to evaluate before the functions exit. Call ToList() on your results and return them in a consistent API that the consumer would like to work with - lists, arrays, collections or IEnumerables.

womp
  • 115,835
  • 26
  • 236
  • 269
  • I did not mean `IEnumerable` is the problem too. And secondly, its not about programmers not knowing about how Linq works, but its that many functions were written not taking Linq into account back then. Now chaining an immediately executing method along with Linq extension methods make it dangerous. So my question is there a general guideline programmers follow to let the caller know what to use from Linq side and what not to? I'm asking more about coding defensively.. – nawfal Nov 26 '12 at 00:14
  • I don't think I understand. If functions were written without taking Linq into account, why are they in danger of deferred execution? And if your extension methods are "dangerous" because they are returning unevaluated queries, why don't you apply the fix I suggest and force them to evaluate before returning? – womp Nov 26 '12 at 00:17
  • That is, see this part of my question `var query = foos.Select(p => p).Where(p => p).Shuffle().OrderBy(p => p);` danger (or in other words undesired) is when I mix an immediately executing function along with Linq method. And there is no question here what so ever about my methods returning unevaluated queries! They are executing it! thats the problem. I dont think you got my question.. – nawfal Nov 26 '12 at 00:21
  • please see my edit, you must have got my words wrong.. Sorry.. – nawfal Nov 26 '12 at 00:27
  • @nawfal: How is it dangerous to call Shuffle and then OrderBy? What is undesirable with that statement other than the fact that they are a bit counteractive..? – Patrick Nov 26 '12 at 00:30
  • @Patrick I would require a non executing (or deferred executing) query. The first expression provides it. But in the second case, since I'm doing a foreach inside Shuffle, the first half of the query gets executed. That's not what typically when I code Linq way I want. So my Shuffle is unlike other Linq methods I have chained with. So how to go defensive about it. General practices, tips etc.. – nawfal Nov 26 '12 at 00:34
  • @nawfal: Alright, thanks for explaining, I didn't understand what you were after until I read your edit in your question. – Patrick Nov 26 '12 at 00:34