27

How to efficiently cache methods compiled from an expression tree ?

public void SomeToStringCalls()
{
    ToString(i => (i + 1).ToString(), 1);
    ToString(i => (i + 1).ToString(), 2);
    ToString(i => (i + 2).ToString(), 3);
    ToString(i => (i + 2).ToString(), 4);
}

private string ToString<T>(Expression<Func<T, string>> expression, T input)
{
    var method = expression.Compile();
    return method.Invoke(input);
}

Above, every call is going to recompile each expression, even if some are identical. I can't have a Dictionary<Expression<Func<T, string>>, Func<T, string>>() caching the compiled method from the expression because the equals will fails.

Kuba hasn't forgotten Monica
  • 95,931
  • 16
  • 151
  • 313
Toto
  • 7,491
  • 18
  • 50
  • 72
  • 1
    Are you sure you want to and need to cache these? Also, should `ToString(j => (j + 1).ToString(), 5);` also benefit from the cached version produced by your first call? That would indicate that you need a comprehensive way to determine equality between expression trees. – Damien_The_Unbeliever Oct 30 '13 at 09:30
  • Yes I need the cache because calls are very frequent. Calls are always done with the same expression. Say I can have a lot of call for both j => (j + 1).ToString() and i => (i + 1).ToString(), i can have 2 differents methods cached in that case (ie I don't need to identy 'similarly equals' expressions. – Toto Oct 30 '13 at 09:51
  • 1
    And did you find a REAL performance problem from compiling or are you prematurely optimizing? Comparing Expressions for equality is [very difficult](http://stackoverflow.com/questions/283537/most-efficient-way-to-test-equality-of-lambda-expressions) and apart from a quick comparison of the expressions' 'ToString()' results, it will cost much more than compiling. – Panagiotis Kanavos Oct 30 '13 at 10:55
  • Do you actually need expression trees, or could you just use delegates and skip the compilation entirely? Where are these expressions coming from? – Mike Strobel Oct 30 '13 at 14:04
  • @Panagiotis : yes benchmarks was done and if I make a kind of fake cache, I have intersting performance changes. – Toto Oct 30 '13 at 15:15
  • @Mike : I need expression trees because theses expression are also used parsed for sql and custom visiting – Toto Oct 30 '13 at 15:16
  • I'm thinking if you had the "code of the closure" it might be a lot easier to deal with. (Could possibly use the string of that code to hash on etc) e.g.: `"i => (i + 1).ToString()"` string itself. Not that comparison and hash of long strings is great, but it might be faster than expression compilation. `Expression.ToString()` might be a start? –  Nov 06 '13 at 16:02
  • `Expression.ToString()` is a poor choice because it is lacking in type information. You would be better off emitting a delegate that reads the private `DebugView` property, which is far more comprehensive. Still, it would only be useful for hashing, and not for an exact equality check. But really, those string operations are expensive enough that a hashing function like mine is probably faster anyway. – Mike Strobel Nov 06 '13 at 16:30
  • @MikeStrobel What type information would be needed? Ok, so maybe `typeof(T).FullName + ":" + Expression.ToString()`. The string gets even longer... (which seems the biggest problem) –  Nov 06 '13 at 18:07
  • @ebyrob The type of every lambda parameter, the return type of methods called, the type of properties accessed, etc. – Mike Strobel Nov 06 '13 at 19:03
  • 1
    Instead of caching, why don't you use a global dictionary of Expressions over a key & you access expression with key. Have a look at SecurityContext at http://entityrestsdk.codeplex.com – Akash Kava Nov 12 '13 at 06:45

7 Answers7

18

The problems with caching expression trees in a centralized cache are:

  1. You will need comprehensive equality comparison and hashing algorithms.
  2. You will need to implement these algorithms yourself, as the standard expression types do not provide them out of the box.

A comprehensive equality comparison will be expensive to perform, but the cost can be alleviated somewhat with a cheap hash function. Also, since expression trees are immutable, you can cache the hash code after you've calculated it for the first time. That may shave some time off lookups, but each time you check the cache using a newly created expression as the key (which, I would imagine, would be most of the time) you will at least need to hash the new expression.

Option 1: Local/Static Caching

An ideal solution would avoid all of this overhead. If it's feasible (i.e., if these expressions are not composed dynamically), your best bet is to simply cache the expression trees near their declaration sites. You should be able to store most (if not all) of these in static fields:

private static readonly Expression<Func<int, string>> _addOne =
    i => (i + 1).ToString();
private static readonly Expression<Func<int, string>> _addTwo =
    i => (i + 2).ToString();

public void SomeToStringCalls()
{
    ToString(_addOne, 1);
    ToString(_addOne, 2);
    ToString(_addTwo, 3);
    ToString(_addTwo, 4);
}

The downside is that you may end up with duplicate expressions across various types, but if you are not generating a very large number of expressions, this may not be an issue.

Option 2: Centralized Caching

If that's not an option for you, you'll have to implement a centralized cache and the hashing and equality algorithms required to do so. The hashing algorithm will help you narrow down your candidate set, so it's important that it work reasonably well, i.e., produce very few collisions in practice. However, given the complex nature of expression trees, you want to keep the costs down. Therefore, I would advise you not aim for a perfect hashing function, but one that is reasonably cheap, yet effective. Perhaps something like this:

internal static class ExpressionHasher
{
    private const int NullHashCode = 0x61E04917;

    [ThreadStatic]
    private static HashVisitor _visitor;

    private static HashVisitor Visitor
    {
        get
        {
            if (_visitor == null)
                _visitor = new HashVisitor();
            return _visitor;
        }
    }

    public static int GetHashCode(Expression e)
    {
        if (e == null)
            return NullHashCode;

        var visitor = Visitor;

        visitor.Reset();
        visitor.Visit(e);

        return visitor.Hash;
    }

    private sealed class HashVisitor : ExpressionVisitor
    {
        private int _hash;

        internal int Hash
        {
            get { return _hash; }
        }

        internal void Reset()
        {
            _hash = 0;
        }

        private void UpdateHash(int value)
        {
            _hash = (_hash * 397) ^ value;
        }

        private void UpdateHash(object component)
        {
            int componentHash;

            if (component == null)
            {
                componentHash = NullHashCode;
            }
            else
            {
                var member = component as MemberInfo;
                if (member != null)
                {
                    componentHash = member.Name.GetHashCode();

                    var declaringType = member.DeclaringType;
                    if (declaringType != null && declaringType.AssemblyQualifiedName != null)
                        componentHash = (componentHash * 397) ^ declaringType.AssemblyQualifiedName.GetHashCode();
                }
                else
                {
                    componentHash = component.GetHashCode();
                }
            }

            _hash = (_hash * 397) ^ componentHash;
        }

        public override Expression Visit(Expression node)
        {
            UpdateHash((int)node.NodeType);
            return base.Visit(node);
        }

        protected override Expression VisitConstant(ConstantExpression node)
        {
            UpdateHash(node.Value);
            return base.VisitConstant(node);
        }

        protected override Expression VisitMember(MemberExpression node)
        {
            UpdateHash(node.Member);
            return base.VisitMember(node);
        }

        protected override MemberAssignment VisitMemberAssignment(MemberAssignment node)
        {
            UpdateHash(node.Member);
            return base.VisitMemberAssignment(node);
        }

        protected override MemberBinding VisitMemberBinding(MemberBinding node)
        {
            UpdateHash((int)node.BindingType);
            UpdateHash(node.Member);
            return base.VisitMemberBinding(node);
        }

        protected override MemberListBinding VisitMemberListBinding(MemberListBinding node)
        {
            UpdateHash((int)node.BindingType);
            UpdateHash(node.Member);
            return base.VisitMemberListBinding(node);
        }

        protected override MemberMemberBinding VisitMemberMemberBinding(MemberMemberBinding node)
        {
            UpdateHash((int)node.BindingType);
            UpdateHash(node.Member);
            return base.VisitMemberMemberBinding(node);
        }

        protected override Expression VisitMethodCall(MethodCallExpression node)
        {
            UpdateHash(node.Method);
            return base.VisitMethodCall(node);
        }

        protected override Expression VisitNew(NewExpression node)
        {
            UpdateHash(node.Constructor);
            return base.VisitNew(node);
        }

        protected override Expression VisitNewArray(NewArrayExpression node)
        {
            UpdateHash(node.Type);
            return base.VisitNewArray(node);
        }

        protected override Expression VisitParameter(ParameterExpression node)
        {
            UpdateHash(node.Type);
            return base.VisitParameter(node);
        }

        protected override Expression VisitTypeBinary(TypeBinaryExpression node)
        {
            UpdateHash(node.Type);
            return base.VisitTypeBinary(node);
        }
    }
}

It's not perfect, but it should get you pretty good results:

  1. It drills down and includes every expression in the tree.
  2. At minimum, the NodeType of every sub-expression is included in the hash. One obvious (but potentially costly) improvement would be to also include the Type; try it if you find you are getting too many collisions.
  3. Members and types referenced in the expression are included.
  4. Constant values appearing in the tree are included.
  5. It doesn't require a heap allocation to run, at the expense of being non-reentrant (since you're only analyzing a top-level expression, this is fine). You can run it concurrently on multiple threads.

Since you're not actually overriding GetHashCode() for any of the expression types, the imperfections of the hashing function will not leak out and affect external code. This gives us a degree of freedom in bending the rules of hash functions.

Your equality comparison will need to be more comprehensive. Whereas the hashing function is effectively an 'estimate' used to minimize your candidate set, the equality comparison performs the actual matching, and it needs to be perfect. You could certainly use my proposed hashing solution as a template for how to approach the problem, but bear in mind you must perform an exhaustive comparison of all an expression's attributes. One thing to bear in mind is that you probably don't want to compare the names of the ParameterExpression nodes in the tree, but you'll want to maintain a mapping of the parameters/variables in the two trees you are comparing to make sure they represent the "same" value in the context of their parent expression tree.

With the exception of ignoring parameter/variable names, do not bother attempting to resolve "semantic equivalence", i.e., expressions which produce the same results and side effects but are not structurally identical. It cannot be done efficiently, and it's not worth it to try.

Lastly, you might be able to speed things up by implementing a two-level lookup: first, choose the correct cache based on the parameter type(s), then look for a match within that cache. This approach would be most effective if it is guaranteed that each lambda expression will have exactly one argument. The benefits would degrade with more arguments.

Mike Strobel
  • 25,075
  • 57
  • 69
  • Very good, thanks. But as OP asked, it must be used in a `Dictionary>, Func>` for caching purposes, so how should we use your `ExpressionHasher.GetHashCode(Expression e)` code? For example, the sample code for passing `IEqualityComparer` to cache Dictionary as this interface also needs `Equals(T x, T y)` in addition to `GetHashCode(T obj)` – S.Serpooshan Sep 17 '18 at 08:29
  • Thank you for an outstanding answer. Impressive reasoning. Quick question. For: `Expression> expression = x => string.Equals("a", "b"); ExpressionHasher.GetHashCode(expression); ` Throws null reference on `.Visit()` [See Fiddle](https://dotnetfiddle.net/CMMOF2). I've added a null check seems to have fixed the problem, would that be correct? Also any suggestions for using Indexers? – Cristian E. Jun 18 '19 at 21:30
5

I found quite some time ago this article, which is exposing pros & cons of expression caching (with constant extraction... which allows you to compile .Where(t=>t.prop==3) and .Where(t=>t.prop==5) to the same delegate).

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
Olivier
  • 5,578
  • 2
  • 31
  • 46
  • 2
    If the site is down, the lastest version from wayback machine: https://web.archive.org/web/20191204114637/http://tips.x-tensive.com/2009/05/fast-expression-compilation.html – user1781290 Nov 05 '20 at 10:42
2

The reason why you can't use Dictionary<Expression<Func<T, string>>, Func<T, string>> is that Expression<T> GetHashCode is not smart enough to detect "equal" expressions. I'm not sure, but it's quite likely that Expression<T>.GetHashCode returns memory address of expression.

To address this issue you could introduce a more "smart" hash calculation. Let's consider equal expressions that have equal bodies. It's quite slippery path, but if you are willing to take responsibility on ensuring that:

  1. No two different expressions have the same hash code
  2. Two expressions with same body have the same hash code

you could achieve what you want.

Here's simple proof of concept code I've assembled for you at pastebin. It's not industrial strength (some hints in comments to improve it), however it clearly demonstrates the feasibility of approach.

A couple of things to consider before elaborating further: improper hash function might result in quite tricky bugs. So, think twice, write a lot of unit tests and prey :)

Bas
  • 26,772
  • 8
  • 53
  • 86
J0HN
  • 26,063
  • 5
  • 54
  • 85
  • Edit was by accident; had some kind of blackout and appended to your answer instead of mine. – Bas Nov 06 '13 at 15:39
  • 1
    "So, think twice, write a lot of unit tests and **prey**" => Unintentionally hilarious typo :D – Mike Strobel Nov 06 '13 at 16:35
  • 1
    `Expression` doesn't override `GetHashCode`, so it inherits the default, so you're close in your guess that it "returns memory address of expression": it's based on the [reference](http://msdn.microsoft.com/en-us/library/system.runtime.compilerservices.runtimehelpers.gethashcode(v=vs.110).aspx) – Tim S. Nov 06 '13 at 16:39
1

The problem you describe is severe in the sense that evaluating two expressions for semantic equality is at least as expensive as compiling the expression. To illustrate this, here is a link to an implementation for expression equality. This implementation is not perfect, for example:

MethodA() { MethodB(); }
MethodB() { ... }

In the example above, MethodA and MethodB are equivalent in the sense that they do the same thing, and you most likely want to consider them as equivalent. For example, building this in C# with compiler optimization enabled will replace the MethodB call with MethodA calls. There are a myriad of issues when comparing code and it is a topic of ongoing research.

You should consider a design where expressions are referenced by some kind of key that identifies it if you find out that compiling the expressions is the bottleneck in your application. By the time you have determined equality you could have already compiled it.

To comment on J0HN's answer, it compares the hash code of the body and the parameters, this is by no means a reliable solution, since it does not do a deep evaluation of the expression tree.

Also, take a look at this question as posted in the comments.

Community
  • 1
  • 1
Bas
  • 26,772
  • 8
  • 53
  • 86
  • Wouldn't comparing code based on “they do the same thing” require solving the (unsolvable) halting problem? – svick Nov 06 '13 at 15:35
  • @svick I think no, because when they are not too obviously equal, we can just treat them as different. – Display Name Nov 06 '13 at 15:38
  • @SargeBorsch But that means dealing with false negatives. To solve it in the "always correct" manor is provably impossible. In fact, proving that two blocks of code always provide the same output for any given input is a famous problem that is provably insolvable. – Servy Nov 06 '13 at 15:39
  • @Servy False negatives can be considered as a non-critical error, since the code will still do what it does, there will just be a possible performance impact. False positives on the other hand creates bugs and execution of the wrong code. – Bas Nov 06 '13 at 15:40
  • 2
    "By the time you have determined equality you could have already compiled it." => I am highly skeptical of this claim. The expression compilation process involves several passes through the tree, and a few transforms. A single pass on two expression trees to determine equality should be far less costly in most cases. – Mike Strobel Nov 06 '13 at 16:51
  • @MikeStrobel Did you measured if the equality detection is less time consuming than expression compilation? Does asp.net do some type of caching for lambda expressions usually used in HtmlHelper methods like as `@Html.TextBoxFor()`...? – S.Serpooshan Sep 17 '18 at 08:11
0

If your goal is to compile + invoke for "extract value" from expression, may be you look into another way.

I trying to extract value from expression tree without compilation through reflection.

My solution is not fully support all expressions, at first it was written for caching method invokation without lambda and arithmetic, but with some improvements it can help.

Here it is:

private static object ExtractValue(Expression expression, object[] input, ReadOnlyCollection<ParameterExpression> parameters)
{
    if (expression == null)
    {
        return null;
    }

    var ce = expression as ConstantExpression;
    if (ce != null)
    {
        return ce.Value;
    }

    var pe = expression as ParameterExpression;
    if (pe != null)
    {
        return input[parameters.IndexOf(pe)];
    }

    var ma = expression as MemberExpression;
    if (ma != null)
    {
        var se = ma.Expression;
        object val = null;
        if (se != null)
        {
            val = ExtractValue(se, input, parameters);
        }

        var fi = ma.Member as FieldInfo;
        if (fi != null)
        {
            return fi.GetValue(val);
        }
        else
        {
            var pi = ma.Member as PropertyInfo;
            if (pi != null)
            {
                return pi.GetValue(val);
            }
        }
    }

    var mce = expression as MethodCallExpression;
    if (mce != null)
    {
        return mce.Method.Invoke(ExtractValue(mce.Object, input, parameters), mce.Arguments.Select(a => ExtractValue(a, input, parameters)).ToArray());
    }

    var sbe = expression as BinaryExpression;
    if (sbe != null)
    {
        var left = ExtractValue(sbe.Left, input, parameters);
        var right = ExtractValue(sbe.Right, input, parameters);

        // TODO: check for other types and operands

        if (sbe.NodeType == ExpressionType.Add)
        {
            if (left is int && right is int)
            {
                return (int) left + (int) right;
            }
        }

        throw new NotImplementedException();
    }

    var le = expression as LambdaExpression;
    if (le != null)
    {
        return ExtractValue(le.Body, input, le.Parameters);
    }

    // TODO: Check for other expression types

    var dynamicInvoke = Expression.Lambda(expression).Compile().DynamicInvoke();
    return dynamicInvoke;
}

With usage:

private static string ToString<T>(Expression<Func<T, string>> expression, T input)
{
    var sw = Stopwatch.StartNew();
    var method = expression.Compile();
    var invoke = method.Invoke(input);
    sw.Stop();
    Console.WriteLine("Compile + Invoke: {0}, {1} ms", invoke, sw.Elapsed.TotalMilliseconds);
    sw.Restart();
    var r2 = ExtractValue(expression, new object[] {input}, null);
    sw.Stop();
    Console.WriteLine("ExtractValue: {0}, {1} ms", r2, sw.Elapsed.TotalMilliseconds);
    return invoke;
}

I think, with some improvements and additional expression types this solution could be faster alternative for Compile().DynamicInvoke()

Vladimir
  • 2,082
  • 1
  • 13
  • 27
  • 1
    Isn't this just interpreting the expression rather than compiling it? Not a bad way to go necessarily, just different from what was originally asked. –  Nov 06 '13 at 19:14
0

See also System.Web.Mvc.ExpressionUtil.CachedExpressionCompiler which can be used to cache expression compilations. If using Mvc 5 call it through reflection, or otherwise you can bring a copy of the source code.

Bouke
  • 11,768
  • 7
  • 68
  • 102
-1

Call me simplistic, but this seems about 4 times faster in one simple scenario I tested:

    public static Dictionary<string, object> cache
        = new Dictionary<string, object>();
    public static string ToString<T>(
            Expression<Func<T, string>> expression,
            T input)
    {
        string key = typeof(T).FullName + ":" + expression.ToString();
        object o;  cache.TryGetValue(key, out o);
        Func<T, string> method = (Func<T, string>)o;
        if (method == null)
        {
            method = expression.Compile();
            cache[key] = method;
        }
        return method.Invoke(input);
    }
  • This will not work with: var x = 1; Console.WriteLine(ToString2(i => (i + x).ToString(), 1)); x += 10; Console.WriteLine(ToString2(i => (i + 1).ToString(), 1)); – Vladimir Nov 06 '13 at 18:43
  • @pil0t That's a good point. But, I don't think any caching can work with anything but closures. (Not sure how to enforce that though) –  Nov 06 '13 at 18:48
  • Not reliable. Expression.ToString() lacks sufficient type information for this to work. Consider the following: `o => ((A)o).X` versus `o => ((B)o).X`: you produce identical keys for both: `System.Object:o => Convert(o).X`, yet the expressions are clearly not equivalent. – Mike Strobel Nov 06 '13 at 19:37
  • @MikeStrobel Wow, I didn't realize. `Convert(o).X` isn't even valid code. I guess to get anything close to a canonical representation would require a custom tree-walk (similar to your solution). I guess my original intent was just to get the string the caller used to create the expression in the first place as that would be guaranteed (given the same accessible environment) to completely describe the operation. –  Nov 06 '13 at 21:15
  • Yeah, `Expression.ToString()` does **not** produce C# code; it's essentially pseudocode and not very complete. – Mike Strobel Nov 06 '13 at 21:18
  • @MikeStrobel I feel like your double-walk hash code/compare operations would definitely give some benefit, but it seems like there should be a single-walk "unique descriptor" generation that could serve both purposes at once though... almost like a JS obfuscator, or byte-code generator... –  Nov 06 '13 at 21:21
  • @ebyrob Well, yes, it could be done with a single pass, but it would require implementing a custom data structure for the cache. If he wants to use a `Dictionary<,>` or something similar, it'll end up performing separate hashing and equality checks anyway. As to reducing the expressions to a "unique descriptor", that is effectively what a hash is. Any technique that does not produce an exact 1:1 pairing with no possibility of collisions is just a complex hashing function anyway. – Mike Strobel Nov 06 '13 at 21:24
  • @ebyrob Also, since expressions are immutable, the hash code can be cached in a "wrapper" to serve as a (pseudo-)unique descriptor. – Mike Strobel Nov 06 '13 at 21:30