52

What's the best way to cache expensive data obtained from reflection? For example most fast serializers cache such information so they don't need to reflect every time they encounter the same type again. They might even generate a dynamic method which they look up from the type.

Before .net 4

Traditionally I've used a normal static dictionary for that. For example:

private static ConcurrentDictionary<Type, Action<object>> cache;

public static DoSomething(object o)
{
    Action<object> action;
    if(cache.TryGetValue(o.GetType(), out action)) //Simple lookup, fast!
    {
        action(o);
    }
    else
    {
        // Do reflection to get the action
        // slow
    }
} 

This leaks a bit of memory, but since it does that only once per Type and types lived as long as the AppDomain I didn't consider that a problem.

Since .net 4

But now .net 4 introduced Collectible Assemblies for Dynamic Type Generation. If I ever used DoSomething on an object declared in the collectible assembly that assembly won't ever get unloaded. Ouch.

So what's the best way to cache per type information in .net 4 that doesn't suffer from this problem? The easiest solution I can think of is a:

private static ConcurrentDictionary<WeakReference, TCachedData> cache.

But the IEqualityComparer<T> I'd have to use with that would behave very strangely and would probably violate the contract too. I'm not sure how fast the lookup would be either.

Another idea is to use an expiration timeout. Might be the simplest solution, but feels a bit inelegant.


In the cases where the type is supplied as generic parameter I can use a nested generic class which should not suffer from this problem. But his doesn't work if the type is supplied in a variable.

class MyReflection
{
    internal Cache<T>
    {
        internal static TData data;
    }

    void DoSomething<T>()
    {
        DoSomethingWithData(Cache<T>.data);
        //Obviously simplified, should have similar creation logic to the previous code.
    }
}

Update: One idea I've just had is using Type.AssemblyQualifiedName as the key. That should uniquely identify that type without keeping it in memory. I might even get away with using referential identity on this string.

One problem that remains with this solution is that the cached value might keep a reference to the type too. And if I use a weak reference for that it will most likely expire far before the assembly gets unloaded. And I'm not sure how cheap it is to Get a normal reference out of a weak reference. Looks like I need to do some testing and benchmarking.

CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
  • The easiest solution I can think of is `dynamic`. – Stephen Cleary Jul 06 '11 at 18:57
  • How does dynamic help in any way? For example consider the case where I've created a dynamic method that serializes a type and want to reuse that method whenever I encounter that type. – CodesInChaos Jul 06 '11 at 18:59
  • 1+ for the link. Didn't listen about this before. – Tigran Jul 06 '11 at 19:08
  • I just came across that link in Eric Lippert's answer http://stackoverflow.com/questions/6600093/do-static-members-ever-get-garbage-collected/6600861#6600861 . Up to now I only knew that dynamic methods can be collected, but since I've never cached per-method data I didn't consider that a problem with my caching code. But full assemblies with real types is a different issue. – CodesInChaos Jul 06 '11 at 19:11
  • 23
    .NET already caches reflection member info. The .NET 1.1 version of it had cooties but that got fixed in 2.0. Caching it again ought to be non-optimal. http://msdn.microsoft.com/en-us/magazine/cc163759.aspx – Hans Passant Jul 06 '11 at 20:17
  • 1
    I usually don't cache the exact data I get back, but usually something derived from it. For example you don't want to recreate a dynamic method based on that type all the time. And I'd be surprised if looking up a member by name with a specified binder is similarly fast as a simple dictionary lookup. – CodesInChaos Jul 06 '11 at 20:21

3 Answers3

37

ConcurrentDictionary<WeakReference, CachedData> is incorrect in this case. Suppose we are trying to cache info for type T, so WeakReference.Target==typeof(T). CachedData most likely will contain reference for typeof(T) also. As ConcurrentDictionary<TKey, TValue> stores items in the internal collection of Node<TKey, TValue> you will have chain of strong references: ConcurrentDictionary instance -> Node instance -> Value property (CachedData instance) -> typeof(T). In general it is impossible to avoid memory leak with WeakReference in the case when Values could have references to their Keys.

It was necessary to add support for ephemerons to make such scenario possible without memory leaks. Fortunately .NET 4.0 supports them and we have ConditionalWeakTable<TKey, TValue> class. It seems the reasons to introduce it are close to your task.

This approach also solves problem mentioned in your update as reference to Type will live exactly as long as Assembly is loaded.

Daniel A.A. Pelsmaeker
  • 47,471
  • 20
  • 111
  • 157
Ivan Danilov
  • 14,287
  • 6
  • 48
  • 66
2

You should check out the fasterflect libary

You could use normal reflection to dynamically generate new code & then emit/compile it and then caching the compiled version. I think the collectible assembly idea is promising, to avoid the memory leak without having to load/unload from a separate appdomain. However, the memory leak should be negligible unless you're compiling hundreds of methods.

Here's a blogpost on dynamically compiling code at runtime: http://introspectingcode.blogspot.com/2011/06/dynamically-compile-code-at-runtime.html

Below is a similar concurrent dictionary approach I've used in the past to store the MethodInfo/PropertyInfo objects & it did seem to be a faster, but I think that was in an old version of Silverlight. I believe .Net has it's own internal reflection cache that makes it unnecessary.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Reflection;
using System.Collections.Concurrent;

namespace NetSteps.Common.Reflection
{
    public static class Reflection
    {
        private static ConcurrentDictionary<Type, Dictionary<string, PropertyInfo>> reflectionPropertyCache = new ConcurrentDictionary<Type, Dictionary<string, PropertyInfo>>();
        public static List<PropertyInfo> FindClassProperties(Type objectType)
        {
            if (reflectionPropertyCache.ContainsKey(objectType))
                return reflectionPropertyCache[objectType].Values.ToList();

            var result = objectType.GetProperties().ToDictionary(p => p.Name, p => p);

            reflectionPropertyCache.TryAdd(objectType, result);

            return result.Values.ToList();
        }

    }
}
E-Riz
  • 31,431
  • 9
  • 97
  • 134
Devin Garner
  • 1,361
  • 9
  • 20
  • The concurrent dictionary is just the implementation I've used so far(see question), but it prevents the unloading of collectible assemblies, so I'm searching for better alternatives. – CodesInChaos Jul 08 '11 at 17:21
  • The caching used in `fasterflect` looks completely broken to me. If I understand it correctly it can get false positive cachehits. Which makes the program broken. (check `BaseEmitter.GetDelegate()` and `BaseEmitter.GetCacheKey()` which rely on the uniqueness of `CallInfo.GetHashCode()`) – CodesInChaos Jul 08 '11 at 17:23
  • @CodeInChaos Fasterflect's author here; Fasterflect's cache doesn't rely on the uniqueness of GetHashCode(). Maybe you were looking at a particular old commit that I can't recall? – Buu Jul 10 '12 at 19:09
  • @BuuNguyen My comment refers to the version of Fasterflect that was current at the time I wrote it (over a year ago). It looks like you fixed it by now. I believe you were using `GetHashCode()` as key in a dictionary somewhere. – CodesInChaos Jul 10 '12 at 19:21
-1

I might be stating the obvious here but:

Don't cache providers typically serialise data to a source?

So surely the deserialisation process is going to be more costly than just simply reflecting out a new instance?

Or did i miss something?

And there's the whole argument around boxing and unboxing time costs ... not sure if that really counts though.

Edit:

How about this (hopefully this explains the problem a bit better)...

Dictionary<string, Type> typecache = new Dictionary<string, Type>();

// finding a type from say a string that points at a type in an assembly not referrenced
// very costly but we can cache that
Type myType = GetSomeTypeWithReflection();
typecache.Add("myType", myType);

// creating an instance you can use very costly
MyThingy thingy = Activator.CreateInstance(typecache["myType"]);

are you looking to cache "thingy"?

CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
War
  • 8,539
  • 4
  • 46
  • 98
  • 1
    I don't understand what you mean. I'm talking about an in memory cache(such as `ConcurrentDictionary` that's used to associate additional information, such as dynamically generated methods with a `Type`. Fast serialization is just one application of this. – CodesInChaos Jul 08 '11 at 17:20
  • oh ok ... so you're caching an instance of an object of a given type? ... but what happens when you need to use it? any changes you make would be lost as soon as another block of code went in and grabbed that instance. Of course cloning is an option ... but that uses reflection too ... so you're no better off. – War Jul 11 '11 at 09:24
  • i updated my answer ... i think this explains my thoughts better ... am i on the right track here? – War Jul 11 '11 at 09:30
  • 1
    No, I wanted to associate some additional data with a `Type` that 1) lives as long as the `Type` 2) does not prevent the assembly that contains the `Type` from being unloaded. – CodesInChaos Jul 11 '11 at 10:12