How do Expression trees allow consumers to evaluate variables?

Question

This question is trying to understand how LINQ works, to implement something similar in another language.

Consider the following LINQ query, to be converted into an expression tree:

var my_variable = "abc";
var qry = from x in source.Foo
          where x.SomeProp == my_variable
          select x.Bar;

which is mapped by the compiler into code:

var qry = source.Foo
           .Where(x => x.SomeProp == my_variable)
           .Select(x => x.Bar);

When this is converted to an expression tree, how do consumers of this expression tree get access to the value of "my_variable".

NOTE: I see now that my confusion was looking at Expression Trees as if they were ASTs. They are not. They are a blend of (passive) AST, and pointers to live program data. That way, the consumer of the Expression tree can decide either to look at the symbolic AST source information, or evaluate it into the live program value.

This answer was the most helpful in helping me understand:

How to get the value of a ConstantExpression which uses a local variable?

Don't ask the same question again. Edit the first one if you don't think it's a duplicate and try to have it reopened. — Manfred Radlwimmer, May 23 '17 at 06:33
Linq is a computer language just like c#. There are rules that the language must follow. So any computer language will have variables, pre-defined words, and operators. — jdweng, May 23 '17 at 06:37
It turns out my problem was mis-understanding Expression Trees.. I thought they were "passive" ASTs, but they are not. Expression nodes (like MemberExpression) are connected to live program data, which allows consumers of the expression to retrieve the live value through the Expression Tree. - https://stackoverflow.com/questions/6998523/how-to-get-the-value-of-a-constantexpression-which-uses-a-local-variable — David Jeske, May 28 '17 at 04:08
@ManfredRadlwimmer FYI - I posted a new question because the site instructions specifically said to "edit the question or post a new one", and the old one was marked as a duplicate of something which wasn't an answer to my question. — David Jeske, May 28 '17 at 04:13
@DavidJeske I highly doubt that it says anywhere on this site that when your question gets put on hold, to simply copy and paste it as a new question. — Manfred Radlwimmer, May 28 '17 at 10:53
@ManfredRadlwimmer my original question was marked as a duplicate of something that was unrelated and did not answer my question. I edited my question, and voted for it to be re-opened, but it was not re-opened. The only choice left was to post again. I can't delete the original question, or I would. — David Jeske, Jun 02 '17 at 16:04

score 0 · Answer 1 · answered May 23 '17 at 11:32

LINQ has two versions. One for IEnumerable sequences and one for IQueryable sequences.

The IEnumerable version is the easiest. That's just calling the correct extension functions with the correct parameters. The reference source code can be found here.

Since you are talking about queries, I assume you want to know how an IQueryable does this.

The difference between a Queryable object that implements IEnumerable and a sequence class that implements the same interface, is that the Queryable object holds an expression and a provider who can calculate the expression. The extension functions for IQueryable like Where and Select know how to change the expression in order to make it an expression that will select the items specified by the LINQ statement.

Since they are extension functions of the IQueryable class, people who implement IQueryable don't have to provide functionality for Where / Select / GroupBy etc. So the implementer can be asserted that the proper Expression has been made.

The difficult part would not be to implement the IQueryable for your class, but to implement the Provider object returned by IQueryable.Provider.

But before going into details for the provider, let's create the class that implements IQueryable:

For this explanation I used the following sources

Implementing a Custom LINQ provider by Jaco Pretorius
Creating a Custom LINQ provider a generic class approach by ???

Suppose I have a class Word (Just what you expect a Word to be, something like a string of characters without white spaces) and a class WordCollection that implements IQueryable<Word>. Since WordCollection implements IQueryable<Word> it also implements IQueryable, IEnumerable<Word> and IEnumerable.

class MyWordCollection : IQueryable<Word>, IQueryable,
    IEnumerable<Word>, IEnumerable
{
    public Type ElementType { get { return typeof(Word); } }
    public IQueryProvider Provider { get; private set; }
    public Expression Expression { get; private set; }

    #region enumerators
    public IEnumerator<Word> GetEnumerator()
    {
        return (Provider.Execute<IEnumerable<Word>>
            (Expression)).GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return this.GetEnumerator();
    }
    #endregion enumerators
}

Note the difference between the IQueryable and the Provider.

The IQueryable is an object where you can query for elements. The IQueryable doesn't have to hold these elements, however it knows who can provide them.
The Provider is the one who can provide the queried objects, provided you give him an Expression to select which objects to provide.

The provider can hold these elements itself, or it can get these elements somewhere else, for instance from a database, or form a file, download the data from the internet, or just hold a sequence of these elements inside it.

Because you have hidden where the Provider gets its data, you can exchange it by other Providers that will provide the same type of elements, but fetch them from another source.

In the class we see the three properties that implement IQueryable<Word> and IQueryable:

ElementType returns the type of the query: When queried, the object will return sequences of Word
Expression holds the expression as calculated by LINQ extension functions
Provider holds the object that can interpret the expression and return the sequence of objects of ElementType, that MyWordCollection promises to return

The latter two functions implement IEnumerable<Word> and IEnumerable. They simply order the Provider to calculate the expression. The result is an Enumerable sequence in memory. they return the GetEnumerator() of this sequence

ElementType is implemented. So all we have to do is fill Expression and Provider. We can do this in a constructor:

public WordCollection(IQueryProvider provider, Expression expression)
{
    this.Provider = provider;
    this.Expression = expression;
}

This constructor gives the user of the WordCollection to put any Provider in it, for instance a Provider that gets its Words from a file, or a Provider that gets its words from the internet, etc.

Let's look at the Provider. The Provider has to implement IQueryProvider, which has four functions: two generic ones for IQueryable<Word> and two non-generic ones for IQueryable.

One of the functions is, given an expression return an IQueryable that can execute this expression. The other function is finally the one will interpret the expression and return the requested item

As an example I have a word provider that represents a collection of all words in the sonnets of William ShakeSpeare

internal class ShakespeareDownloader
{
    private const string sonnetsShakespeare = "http://www.gutenberg.org/cache/epub/1041/pg1041.txt";

    public string DownloadSonnets()
    {
        string bookShakespeareSonnets = null;
        using (var downloader = new WebClient())
        {
            bookShakespeareSonnets = downloader.DownloadString(sonnetsShakespeare);
        }
        return bookShakespeareSonnets;
    }
}

So now we have a source for all the words, I can create a word provider:

class WordProvider : IQueryyProvider
{
    private IQueryable<Word> allSonnetWords = null;

    private IQueryable<Word> GetAllSonnetWords()
    {
        if (allSonnetWords == null)
        {
            string sonnets = shakespeareDownLoader.DownloadSonnets();
            // split on whitespace:
            string[] sonnetWords = sonnets.Split((char[])null,
                StringSplitOptions.RemoveEmptyEntries);

            this.allSonnetWords = sonnetWords
                .Select(word => new Word() { Text = word })
                .AsQueryable<Word>();
        }
    }

Ok, now we have something that can provide us with words. We can write something like:

IQueryable<Word> allWords = new WordCollection();

As seen above, this will construct a WordProvider with an expression to get all Words. As soon as the query is executed CreateQuery of the WordProvider is called. All it has to do is create a new WordCollection for this WordProvider and the given expression:

public IQueryable<TElement> CreateQuery<TElement>(Expression expression)
{
    return new WordCollection(this, expression) as IQueryable<TElement>;
}

After a while the collection is enumerator, and WordCollection.GetEnumerator() is called. As shown above this will ask the provider to enumerate using the expression, causing WordProvider.Execute<TResult>(Expression) to be called

After all this ping-ponging between the WordCollection and the WordProvider the provider can finally can start executing the expression.

public TResult Execute<TResult>(Expression expression)
{
}

Although My provider can only provide sequences of words, this function is a generic function. The result of the execution is either a sequence of Words, or a single Word (for instance when using First(), or Max()). If it asks for something else I cannot provide it, and an exception is allowed.

The function is declared generic in case I would have been a provider that could also provide other things than Words, like sentences.

So the Execute function should interpret the Expression and return a TResult, which is expected to be either a sequence of Words or one Word.

First things we have to do is get the complete collection of sonnetWords and determine whether a sequence is requested or a single Word:

public TResult Execute<TResult>(Expression expression)
{
    IQueryable<Word> allSonnetWords = this.GetAllSonnetWords();
    var sequenceRequested = typeof(TResult).Name == "IEnumerable`1";

Until now all code was fairly straightforward. But now the difficult part is to interpret the Expression to find out what to return.

Luckily in this example the collection of allSonnetWords is also IQueryable. We have to change the expression slightly so it can work on an IQueryable instead of work on the WordProvider.

This can be done by a derived class of ExpressionVisitor:

class WordExpressionModifier : System.Linq.Expression.ExpressionVisitor
{
    public ExpressionTreeModifier(IQueryable<Word> allWords)
    {
        this.allWords = allWords;
    }

    private readonly IQueryable<Word> allWords;

    protected override Expression VisitConstant(ConstantExpression c)
    {
        // if the type of the constant expression is a WordCollection
        // return the same expression for allWords
        if (c.Type == typeof(WordCollection))
            return Expression.Constant(allWords);
        else
            return c;
    }

So now we can finish the Execute function

public TResult Execute<TResult>(Expression expression)
{
    IQueryable<Word> allSonnetWords = this.GetAllSonnetWords();
    var sequenceRequested = typeof(TResult).Name == "IEnumerable`1";

    // replace the expression for a WordCollection to an expression for IQueryable<Word>
    var expressionModifier = new ExpressionTreeModifier(allSonnetWords);
    Expression modifiedExpression = expressionModifier.Visit(expression);

    TResult result;
    if (isEnumerable)
        // A sequence is requested. Return a query on allSonnetWords provider
        result = (TResult)allSonnetWords.Provider.CreateQuery(modifiedExpression);
    else
        // a single element is requested. Execute the query on the allSonnetWords.provider
            result = (TResult)allSonnetWords.Provider.Execute(modifiedExpression);
    return result;
}

Now all work is done, we can try it:

static void Main(string[] args)
{
    IQueryable<Word> allWords = new WordCollection();
    IQueryable<Word> Bwords = allWords
        .Where(word => word.Text.FirstOrDefault() == 'b')
        .Take(40);

    foreach (var word in Bwords)
    {
        Console.WriteLine(word.Text);
    }
}

This is a very detailed answer, but it's not answering my question. My question is ... when LINQ quotes the lambda `word => word.Text.FirstOrDefault() == my_local_var`` into a runtime parsable expression, how does it know to capture "my_local_var" instead of simply putting a symbol into the expression? Is it just a set of hardcoded rules in the LINQ compiler? — David Jeske, May 24 '17 at 18:18

How do Expression trees allow consumers to evaluate variables?

1 Answers1