8

Background

The background for this is that I had a recent conversation in the comments with another clearly knowledgeable user about how LINQ is compiled. I first "summarized" and said LINQ was compiled to a for loop. While this isn't correct, my understanding from other stacks such as this one is that the LINQ query is compiled to a lambda with a loop inside of it. This is then called when the variable is enumerated for the first time (after which the results are stored). The other user said that LINQ takes additional optimizations such as hashing. I couldn't find any supporting documentation either for or against this.

I know this seems like a really obscure point but I have always felt that if I don't understand how something works completely, its going to be difficult to understand why I'm not using it correctly.

The Question

So, lets take the following very simple example:

var productNames = 
    from p in products 
    where p.Id > 100 and p.Id < 5000
    select p.ProductName;

What is this statement actually compiled to in CLR? What optimizations does LINQ take over me just writing a function that manually parses the results? Is this just semantics or is there more to it than that?

Clarification

Clearly I'm asking this question because I don't understand what the inside of the LINQ "black box" looks like. Even though I understand that LINQ is complicated (and powerful), I'm mostly looking for a basic understanding of either the CLR or a functional equivalent to a LINQ statement. There are great sites out there for helping understand how to create a LINQ statement but very few of these seem to give any guidance on how those are actually compiled or run.

Side Note - I will absolutely read through the John Skeet series on linq to objects.

Side Note 2 - I shouldn't have tagged this as LINQ to SQL. I understand how ORM's and micro-ORM's work. That is really besides the point of the question.

Community
  • 1
  • 1
drew_w
  • 10,320
  • 4
  • 28
  • 49
  • `the LINQ query is compiled to a lambda with a loop inside of it. This is then called when the variable is enumerated for the first time (after which the results are stored). The other user said that LINQ takes additional optimizations such as hashing` Most of that is wrong. – SLaks Dec 13 '13 at 18:57
  • 3
    [Reimplementing LINQ to Objects - By Jon Skeet](http://msmvps.com/blogs/jon_skeet/archive/2010/09/03/reimplementing-linq-to-objects-part-1-introduction.aspx) – Habib Dec 13 '13 at 18:58
  • 1
    See http://msmvps.com/blogs/jon_skeet/archive/tags/Edulinq/default.aspx for a _thorough_ explanation of LINQ to Objects. – SLaks Dec 13 '13 at 18:58
  • Since you've tagged this linq to sql, then in those cases the compiler is doing very very little. It's mostly just passing a representation of the code of your query to a query provider, which is doing...whatever it wants with that information. – Servy Dec 13 '13 at 19:01
  • @SLaks I'm happy to revise the wording. The point of the question is to say "What - at a low level - is LINQ actually?" – drew_w Dec 13 '13 at 19:02
  • Again: LINQ to _what_? LINQ is just a set of extension methods. – SLaks Dec 13 '13 at 19:02
  • @drew_w And that's rather vague. Are you talking about how the `Enumerable.Where` method is implemented? Are you interested in how the query syntax translates into method syntax, are you interested in how query providers work, how code is compiled into expressions to be sent to a query provider, the language features used to turn the example you have into essentially C# 2.0 code, etc.? There are simply so many related concepts here that you don't clarify. Obviously bringing them all up makes the question too broad. – Servy Dec 13 '13 at 19:04
  • You will only see hashing come into play where it can be, for example string compare. If you where was `where p.ProductName == 'GoodStuff'` the hashing would come into play. – Hogan Dec 13 '13 at 19:04
  • 1
    @Hogan No it wouldn't...If you used something like `Join`, `GroupBy`, `Distinct`, `ToLookup`, etc., sure. But `Where` won't use a hash based algorithm/structure at all. – Servy Dec 13 '13 at 19:05
  • @Hogan: Wrong. Hashing is used for methods like `Join()`, `Distinct()`, and `GroupBy()`, regardless of datatypes. – SLaks Dec 13 '13 at 19:06
  • Servy and SLaks - You got me, I saw this on a `Intersect()` call not on a `Where()` now that I think about it. – Hogan Dec 13 '13 at 19:10
  • @Habib - What makes you sure that Jon implemented this the same way Microsoft did? – Hogan Dec 13 '13 at 19:12
  • 1
    @Hogan Read through the posts. He discusses in each one how MS implements theirs, and when/why he chooses to diverge from their implementation. – Servy Dec 13 '13 at 19:14
  • 1
    @Hogan, [don't ever question Jon Skeet](http://meta.stackexchange.com/questions/9134/jon-skeet-facts) :) – Habib Dec 13 '13 at 19:15
  • `I understand that LINQ is extremely complicated` You clearly haven't read through any of Jon's articles yet. The majority of LINQ methods are *super* simple. Outside of handling argument validation, and optional arguments, a huge number of LINQ methods can be functionally replicated in as few as one or two statements. There are only a small number that really are rather complex (i.e. all of the ordering related methods). – Servy Dec 13 '13 at 19:23
  • @Servy Hopefully my clarification in the question helped in figuring out what I am looking for. Another way to phrase the question is "Whats so different about LINQ versus me just writing a function withe a loop in it" – drew_w Dec 13 '13 at 19:24
  • @drew_w At the day, if both are written properly, nothing. The design of LINQ is to make writing code easier and clearer. If it's different it's generally because either the LINQ or non-LINQ solution wasn't written effectively. – Servy Dec 13 '13 at 19:25
  • @Servy So back to my second part of the question "Is this just semantics" ... the answer is yes? – drew_w Dec 13 '13 at 19:27
  • @drew_w In theory, yes, in practice, many people, when not using LINQ, don't write programs that would be functionally equivalent to what LINQ generates. There are certain practices and concepts that LINQ code generally follows, that many people (rarely) chose to use outside of LINQ based solutions, such as the idea of streaming/piplining processing of data. You certainly *can* do it without linq, and people do do it, but not as often as they would if LINQ didn't exist. – Servy Dec 13 '13 at 19:30
  • @Servy Are you talking about things such as a yield return (http://msdn.microsoft.com/en-us/library/9k7k7cf0.aspx)? – drew_w Dec 13 '13 at 19:31
  • @drew_w With respect to what statement? – Servy Dec 13 '13 at 19:34
  • @Servy You said: `There are certain practices and concepts that LINQ code generally follows, that many people (rarely) chose to use outside of LINQ based solutions, such as the idea of streaming/piplining processing of data`. I thought that yield return might be an example of a less common practice? Just trying to figure out what linq would do that I might not normally think of! – drew_w Dec 13 '13 at 19:37
  • @drew_w Actually you should very rarely need to use `yield return` because you can instead use LINQ. `yield return` would be one, of several possible ways, of providing that same general behavior without actually using linq. – Servy Dec 13 '13 at 19:57

2 Answers2

13

For LINQ to Objects, this is compiled into a set of static method calls:

var productNames = 
    from p in products 
    where p.Id > 100 and p.Id < 5000
    select p.ProductName;

Becomes:

IEnumerable<string> productNames = products
                                       .Where(p => p.Id > 100 and p.Id < 5000)
                                       .Select(p => p.ProductName);

This uses extension methods defined in the Enumerable type, so is actually compiled to:

IEnumerable<string> productNames = 
     Enumerable.Select(
        Enumerable.Where(products, p => p.Id > 100 and p.Id < 5000),
        p => p.ProductName
     );

The lambda expressions to handle this are turned into methods by the compiler. The lambda in the where is turned into a method which can be set to a Func<Product, Boolean>, and the select into a Func<Product, String>.

For a thorough explanation, see Jon Skeet's blog series: Reimplementing LINQ to Objects. He walks through the entire process of how this works, including the compiler transformations (from query syntax to method calls), how the methods are implemented, etc.

Note that LINQ to Sql and IQueryable<T> implementations are different. The Expression<T> that is generated by the lambda is passed into the query provider, which in turn is "transformed" in some manner (it's up to the provider how to do this) into calls, typically run on the server in the case of an ORM.


For this method, for example:

    private static IEnumerable<string> ProductNames(IEnumerable<Product> products)
    {
        var productNames =
            from p in products
            where p.Id > 100 && p.Id < 5000
            select p.ProductName;
        return productNames;
    }

Gets compiled to the following IL:

  .method private hidebysig static class [mscorlib]System.Collections.Generic.IEnumerable`1<string> ProductNames(class [mscorlib]System.Collections.Generic.IEnumerable`1<class ConsoleApplication3.Product> products) cil managed
{
    .maxstack 3
    .locals init (
        [0] class [mscorlib]System.Collections.Generic.IEnumerable`1<string> enumerable,
        [1] class [mscorlib]System.Collections.Generic.IEnumerable`1<string> enumerable2)
    L_0000: nop 
    L_0001: ldarg.0 
    L_0002: ldsfld class [mscorlib]System.Func`2<class ConsoleApplication3.Product, bool> ConsoleApplication3.Program::CS$<>9__CachedAnonymousMethodDelegate3
    L_0007: dup 
    L_0008: brtrue.s L_001d
    L_000a: pop 
    L_000b: ldnull 
    L_000c: ldftn bool ConsoleApplication3.Program::<ProductNames>b__2(class ConsoleApplication3.Product)
    L_0012: newobj instance void [mscorlib]System.Func`2<class ConsoleApplication3.Product, bool>::.ctor(object, native int)
    L_0017: dup 
    L_0018: stsfld class [mscorlib]System.Func`2<class ConsoleApplication3.Product, bool> ConsoleApplication3.Program::CS$<>9__CachedAnonymousMethodDelegate3
    L_001d: call class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0> [System.Core]System.Linq.Enumerable::Where<class ConsoleApplication3.Product>(class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0>, class [mscorlib]System.Func`2<!!0, bool>)
    L_0022: ldsfld class [mscorlib]System.Func`2<class ConsoleApplication3.Product, string> ConsoleApplication3.Program::CS$<>9__CachedAnonymousMethodDelegate5
    L_0027: dup 
    L_0028: brtrue.s L_003d
    L_002a: pop 
    L_002b: ldnull 
    L_002c: ldftn string ConsoleApplication3.Program::<ProductNames>b__4(class ConsoleApplication3.Product)
    L_0032: newobj instance void [mscorlib]System.Func`2<class ConsoleApplication3.Product, string>::.ctor(object, native int)
    L_0037: dup 
    L_0038: stsfld class [mscorlib]System.Func`2<class ConsoleApplication3.Product, string> ConsoleApplication3.Program::CS$<>9__CachedAnonymousMethodDelegate5
    L_003d: call class [mscorlib]System.Collections.Generic.IEnumerable`1<!!1> [System.Core]System.Linq.Enumerable::Select<class ConsoleApplication3.Product, string>(class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0>, class [mscorlib]System.Func`2<!!0, !!1>)
    L_0042: stloc.0 
    L_0043: ldloc.0 
    L_0044: stloc.1 
    L_0045: br.s L_0047
    L_0047: ldloc.1 
    L_0048: ret 
}

Note that these are normal call instructions for the method calls. The lambdas get converted into other methods, such as:

[CompilerGenerated]
private static bool <ProductNames>b__2(Product p)
{
    return ((p.Id > 100) && (p.Id < 0x1388));
}
Reed Copsey
  • 554,122
  • 78
  • 1,158
  • 1,373
  • But what does the .Where() and .Select() get compiled to? Internally CIL doesn't have a .Where(). See: http://en.wikipedia.org/wiki/List_of_CIL_instructions I'm trying to figure out what this resolves to at a low level. Thats the real purpose of this question! – drew_w Dec 13 '13 at 19:00
  • @drew_w I answered that - it gets transformed into Enumerable.Where and Enumerable.Select method calls. – Reed Copsey Dec 13 '13 at 19:01
  • This answer does not address OP's core question `What is this statement actually compiled to in CLR?` – Shiva Dec 13 '13 at 19:02
  • @drew_w: Those are regular static functions. Jon Skeet shows you how to write them yourself? – SLaks Dec 13 '13 at 19:03
  • @Shiva: It gets compiled to normal `call` instructions. – SLaks Dec 13 '13 at 19:03
  • @drew_w Does that extra detail give you what you want? – Reed Copsey Dec 13 '13 at 19:11
  • @ReedCopsey - Why do you think that Jon implemented this the same way Microsoft did? – Hogan Dec 13 '13 at 19:13
  • @Hogan He didn't implement it the same way - he was trying to do a thorough explanation of how it works, but he even remarks that his implementation differs from the official one. He was just trying to show how the API *could* be implemented to make it understandable. – Reed Copsey Dec 13 '13 at 19:14
  • @ReedCopsey - This is my point, the question is "How is this implemented by Microsoft". Jon's series, while great reading, does not answer this question at all. – Hogan Dec 13 '13 at 19:16
  • @Hogan I wouldn't say it doesn't answer it at all. While it may not be the exact code, it is almost certainly sufficient to gain an understanding as to what is going on behind the scenes, and the primary tools/approaches being used to solve these problems. – Servy Dec 13 '13 at 19:20
  • @ReedCopsey I feel you answered the first part of the question here - but I'm highly interested in the answers to the other two parts of this as well: `What optimizations does LINQ take over me just writing a function that manually parses the results? Is this just semantics or is there more to it than that?` – drew_w Dec 13 '13 at 19:29
  • 1
    @drew_w It's just semantics - I tried to explain that by showing what's happening, but it's just transforming to method calls, and the method calls (internally) for the most part just loop. WIth `IQueryable`, ther'es optimizations in that the query can happen on the server, not locally, but with LINQ to Objects, it's more about making the usage simplified and cleaner. – Reed Copsey Dec 13 '13 at 19:45
  • @drew_w if you want to see CIL code,open your .exe or .dll file with ILDASM.exe (intermediate language disassembler) and see the cil code. – Selman Genç Dec 13 '13 at 19:49
  • The link is broken, the new link is https://codeblog.jonskeet.uk/2010/09/03/reimplementing-linq-to-objects-part-1-introduction/ – kuldeep Sep 14 '22 at 10:58
-1

Query syntax is just syntactic sugar for method syntax, it effectively gets compiled to this:

var productNames = Products().Where(p => p.Id > 100 && p.Id < 5000).Select(p => productName);

Now what those functions actually do depends on which flavour of LINQ you're using e.g. Linq to Objects (which chains together in-memory handlers) or Linq to SQL (which converts it an SQL query) etc.

Mark Feldman
  • 15,731
  • 3
  • 31
  • 58
  • 1) You don't really answer your own question. The clarifications added make it rather clear that you're *not* interested in how query syntax maps to method syntax, but rather the implementations of the actual `Where`, `Select`, etc. methods. 2) Everything you've said after the code snippet isn't write. Every single linq-to-anythingButObjects provider is based on the `IQueryable` interface, and the `Where`, `Select`, etc. methods are all `Queryable.Where`, etc., and are not different methods for each query provider. It is what the query provider does with the expressions generated that differs. – Servy Dec 13 '13 at 22:06