63

I love using LINQ in .NET, but I want to know how that works internally?

underscore_d
  • 6,309
  • 3
  • 38
  • 64
rpf
  • 3,612
  • 10
  • 38
  • 47
  • 8
    Consider buying John Skeet's book C# in depth – Brian R. Bondy Mar 22 '09 at 16:32
  • 1
    C# 3.0 in a nutshell is also good. – Mehrdad Afshari Mar 22 '09 at 16:34
  • 2
    I read one third of [Linq in Action][1] and it's a great book. [1]: http://www.manning.com/marguerie/ – pero Mar 23 '09 at 13:18
  • @BrianR.Bondy yes usefull. I found the book here. https://livebook.manning.com/#!/book/c-sharp-in-depth-third-edition/chapter-1 – Bimal Das Dec 06 '17 at 05:20
  • This is a course on PluralSight: **LINQ Architecture**. The **"LINQ - Beyond Queries"** chapter would allow you to have a good understanding of how a LINQ-like methods are designed. https://app.pluralsight.com/library/courses/linq-architecture – AlexMelw May 02 '20 at 16:38
  • This explains what happens behind the scenes in .NET Framework and .NET 5: https://levelup.gitconnected.com/linq-behind-the-scenes-efd664d9ebf8?sk=cba7416407ec8b753d9961fe23aac173 – David Klempfner Apr 20 '21 at 10:46

4 Answers4

88

It makes more sense to ask about a particular aspect of LINQ. It's a bit like asking "How Windows works" otherwise.

The key parts of LINQ are for me, from a C# perspective:

  • Expression trees. These are representations of code as data. For instance, an expression tree could represent the notion of "take a string parameter, call the Length property on it, and return the result". The fact that these exist as data rather than as compiled code means that LINQ providers such as LINQ to SQL can analyze them and convert them into SQL.
  • Lambda expressions. These are expressions like this:

    x => x * 2
    (int x, int y) => x * y
    () => { Console.WriteLine("Block"); Console.WriteLine("Lambda"); }
    

    Lambda expressions are converted either into delegates or expression trees.

  • Anonymous types. These are expressions like this:

    new { X=10, Y=20 }
    

    These are still statically typed, it's just the compiler generates an immutable type for you with properties X and Y. These are usually used with var which allows the type of a local variable to be inferred from its initialization expression.

  • Query expressions. These are expressions like this:

    from person in people
    where person.Age < 18
    select person.Name
    

    These are translated by the C# compiler into "normal" C# 3.0 (i.e. a form which doesn't use query expressions). Overload resolution etc is applied afterwards, which is absolutely key to being able to use the same query syntax with multiple data types, without the compiler having any knowledge of types such as Queryable. The above expression would be translated into:

    people.Where(person => person.Age < 18)
          .Select(person => person.Name)
    
  • Extension methods. These are static methods which can be used as if they were instance methods of the type of the first parameter. For example, an extension method like this:

    public static int CountAsciiDigits(this string text)
    {
        return text.Count(letter => letter >= '0' && letter <= '9');
    }
    

    can then be used like this:

    string foo = "123abc456";
    int count = foo.CountAsciiDigits();
    

    Note that the implementation of CountAsciiDigits uses another extension method, Enumerable.Count().

That's most of the relevant language aspects. Then there are the implementations of the standard query operators, in LINQ providers such as LINQ to Objects and LINQ to SQL etc. I have a presentation about how it's reasonably simple to implement LINQ to Objects - it's on the "Talks" page of the C# in Depth web site.

The way providers such as LINQ to SQL work is generally via the Queryable class. At their core, they translate expression trees into other query formats, and then construct appropriate objects with the results of executing those out-of-process queries.

Does that cover everything you were interested in? If there's anything in particular you still want to know about, just edit your question and I'll have a go.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • @JonSkeet I would add generics (and type inference and `var`) to the key aspects that makes linq possible.. – nawfal Oct 20 '13 at 06:52
  • @nawfal: I've got `var` already under "anonymous types" - and while generics are necessary, they were already part of C# 2, and broadly used outside LINQ. If we're going to list *everything* required, that would have to include "property access", and "variables" etc. – Jon Skeet Oct 20 '13 at 07:10
  • @JonSkeet `var` deserves a mention outside anonymous types, because it would have been a turn off to write `IOrderedEnumerable` etc, similarly type inference. Hmmm, personally I feel generics (even though came before) is little different from property access when it comes to its utility in linq, ymmv. – nawfal Oct 20 '13 at 07:18
  • Thanks for sharing. In what case will lambda expressions be converted to delegates, and what cases to expression tree? – VincentZHANG May 30 '17 at 21:40
  • @VincentZHANG: Basically, it's whatever you're trying to convert it to, as part of a method call, or assigning to a variable or whatever. – Jon Skeet May 31 '17 at 07:31
5

LINQ is basically a combination of C# 3.0 discrete features of these:

  • local variable type inference
  • auto properties (not implemented in VB 9.0)
  • extension methods
  • lambda expressions
  • anonymous type initializers
  • query comprehension

For more information about the journey to get there (LINQ), see this video of Anders in LANGNET 2008:

http://download.microsoft.com/download/c/e/5/ce5434ca-4f54-42b1-81ea-7f5a72f3b1dd/1-01%20-%20CSharp3%20-%20Anders%20Hejlsberg.wmv

Eriawan Kusumawardhono
  • 4,796
  • 4
  • 46
  • 49
3

In simple a form, the compiler takes your code-query and converts it into a bunch of generic classes and calls. Underneath, in case of Linq2Sql, a dynamic SQL query gets constructed and executed using DbCommand, DbDataReader etc.

Say you have:

var q = from x in dc.mytable select x;

it gets converted into following code:

IQueryable<tbl_dir_office> q = 
    dc.mytable.Select<tbl_dir_office, tbl_dir_office>(
        Expression.Lambda<Func<mytable, mytable>>(
            exp = Expression.Parameter(typeof(mytable), "x"), 
            new ParameterExpression[] { exp }
        )
    );

Lots of generics, huge overhead.

Ruslan
  • 1,761
  • 9
  • 16
  • 1
    Huge overhead? what do you mean? – Pop Catalin Mar 22 '09 at 17:45
  • The Select about will end up calling a provider's execute method, which initialize mode, determine connection, check transactions, initialize parameter collections, call a reader, translate results, parse... thousands of lines. – Ruslan Mar 22 '09 at 21:01
  • 3
    @Ruslan, almost all the things you mentioned you have to do anyway so they are not considered overhead, and besides, checking some things like if there is a transaction attached has a tiny cost compared to executing the command on the DB. – Pop Catalin Mar 22 '09 at 22:11
  • 1
    The only overhead Linq has is translating the expression tree into a SQL query... but that doesn't have anything to do with generics, from your answer I thought you said the generics are the overhead. – Pop Catalin Mar 22 '09 at 22:13
  • No, I didn't mean generics. I meant lots of code behind doing what is normally done by the SQL server. IIRC, when asked at PDC about LINQ performance, they said something around 15% overhead. I'd have to do profiling myself. – Ruslan Mar 23 '09 at 01:39
1

Basically linq is a mixture of some language facilities (compiler) and some framework extensions. So when you write linq queries, they get executed using appropriate interfaces such as IQuerable. Also note that the runtime has no role in linq.

But it is difficult to do justice to linq in a short answer. I recommend you read some book to get yourself in it. I am not sure about the book that tells you internals of Linq but Linq in Action gives a good handson about it.

Hemant
  • 19,486
  • 24
  • 91
  • 127