3

I'm trying to offload work from my database server by introducing a cache layer for some very central functions that insert a value to a table in the database and retrieves the id. This is in a multi-threaded environment.

My first approach was:

public class Cache {
      private Dictionary<string, Int64> i;

      public void Init() { /* init i with values from DB */ }

      public Int64 Get(string value)
         lock(i) {
            Int64 id;
            if (cache.i.TryGetValue(value, out id))
                return id;

            id = /* Insert to DB and retrieve ID */
            cache.i[value] = id;
            return id;
      }
 }

This helped. However the threads still wait a lot for each other. I'd like to reduce this waiting time. My first thought was to use ConcurrentDictionary.GetOrAdd(key, valueFactory). This would not work because valueFactory could be called more than once.

I've wound up at this approach:

public class Cache
{
    private ConcurrentDictionary<string, Int64> i;

    public void Init() { /* init i with values from DB */ }

    public Int64 Get(string value)
    {
        Int64 id;
        if (i.TryGetValue(value, out id))
            return id;

        lock (i)
        {
            if (i.TryGetValue(value, out id))
                return id;

            id = /* Insert to DB and retrieve ID */
            i.TryAdd(value, id);
            return id;
        }
    }

Is there a better way of doing this? Is this even thread-safe?

Kasper Videbæk
  • 218
  • 2
  • 8
  • 1
    i wouldn't lock on the resource you are trying to modify. – Daniel A. White Jul 23 '14 at 19:52
  • 1
    @DanielA.White Why not? It's a perfectly fine practice. – Servy Jul 23 '14 at 19:53
  • 1
    @Servy you don't know if the .net framework could be locking on that reference. – Daniel A. White Jul 23 '14 at 19:54
  • 1
    @Servy http://stackoverflow.com/questions/251391/why-is-lockthis-bad – Daniel A. White Jul 23 '14 at 19:56
  • 2
    @DanielA.White I would hope that the definition of `ConcurrentDictionary` follows the advice in the article you linked and doesn't doesn't lock on itself because MS knows that it's a bad idea to do so. A cursory check through it's definition leads me to believe that it doesn't lock on itself, as would be expected. – Servy Jul 23 '14 at 19:57
  • A related answer can be found here: http://stackoverflow.com/a/12611341/1236734 – JG in SD May 19 '16 at 14:58

2 Answers2

12

What you're trying to do is lazily create an object that needs to be created no more than once and then accessed by any number of threads once created. Lazy is designed for exactly this:

public class Cache
{
    private ConcurrentDictionary<string, Lazy<long>> i;

    public void Init() { /* init i with values from DB */ }

    public Int64 Get(string value)
    {
        return i.GetOrAdd(value, new Lazy<long>(() =>
            CreateDatabaseRecordAndGetId()))
            .Value;
    }

    private long CreateDatabaseRecordAndGetId()
    {
        throw new NotImplementedException();
    }
}
Servy
  • 202,030
  • 26
  • 332
  • 449
  • Thanks for pointing me to Lazy. Wouldn't LazyThreadSafetyMode.ExecutionAndPublication be needed though? – Kasper Videbæk Jul 23 '14 at 20:05
  • @KasperVidebæk That is the default behavior. Basically it expects that everything needs to be fully synchronized unless you specifically tell it *not* to. – Servy Jul 23 '14 at 20:08
  • Ok. Thanks. Would this approach be more thread safe or faster than my current approach? – Kasper Videbæk Jul 23 '14 at 20:36
  • 1
    @KasperVidebæk Both are safe. It's faster in that if two threads each try to add a new item they don't need to wait on each other to run their initialization methods. In your case whenever one thread is initialing a value no other thread can be initializing any other value. It's also clearer and easier to work with as it's working with higher level tools that represent exactly what you want to do, and so don't have that margin for error or misunderstanding. – Servy Jul 23 '14 at 20:39
  • My caches went from having a memory footprint of 1.6GB to having one of 2.3GB. I edited to create a version that does not need the lazy objects to be stored. Would you agree that this works? – Kasper Videbæk Jul 25 '14 at 14:48
  • @KasperVidebæk Your edit is invalid and removes the safety of the implementation. Your solution would result in computing the value multiple times if the value was requested by multiple threads at the same time. – Servy Jul 25 '14 at 14:50
  • @KasperVidebæk No, it does not work. Using less memory doesn't matter when the solution doesn't work. Your solution is no different then not using `Lazy` at all. – Servy Jul 25 '14 at 14:50
  • I see. Memory is unfortunately an issue in some of the deployment environments and this 35% memory increase pushed me over my limits and started giving me paging issues. Do you have any ideas? – Kasper Videbæk Jul 25 '14 at 15:20
  • @KasperVidebæk As always, you have the choice between trading off memory for speed. If you need to use less memory, you need to make your program slower. In this case that likely means using more granular locking, which means more waiting for this that don't strictly need to be waited on. If you can live with the reduced runtime performance, you won't need as much memory. – Servy Jul 25 '14 at 15:23
  • Ok. I had no performance issues with the solution in my post - cache misses are quite rare. I was mostly wondering if it was threads-safe and if there was a better (more clean) way of implementing it. I guess this is more clean, but the increased memory footprint does not really trade into extra performance in this case. – Kasper Videbæk Jul 25 '14 at 15:40
1

Just FYI, in Servy's example you get an instance of Lazy created for every call to GetOrAdd. Now, the magic of Lazy still happens, and you only get one call to your Func that creates your instance. But maybe the extra instantiations of Lazy in the above example explain the memory increase you saw when you tried it.

If you create a "double" lambda, you don't get multiple instantiations of Lazy.

E.g. paste this into a console app and compare the implementation with and without the x => new Lazy... below:

public static class LazyEvaluationTesting
{
    private static readonly ConcurrentDictionary<int, CustomLazy<CacheableItem>>
        cacheableItemCache = new ConcurrentDictionary<int, CustomLazy<CacheableItem>>();

    private static CacheableItem RetrieveCacheableItem(int itemId)
    {
        Console.WriteLine("--RETRIEVE called\t ItemId [{0}] ThreadId [{1}]", itemId, Thread.CurrentThread.ManagedThreadId);
        return new CacheableItem
        {
            ItemId = itemId
        };
    }

    private static void GetCacheableItem(int itemId)
    {
        Console.WriteLine("GET called\t ItemId [{0}] ThreadId [{1}]", itemId, Thread.CurrentThread.ManagedThreadId);

        CacheableItem cacheableItem = cacheableItemCache
            .GetOrAdd(itemId,
                x => new CustomLazy<CacheableItem>(
                    () => RetrieveCacheableItem(itemId)
                )
            ).Value;

        //CacheableItem cacheableItem2 = cacheableItemCache
        //  .GetOrAdd(itemId,
        //      new CustomLazy<CacheableItem>(
        //          () => RetrieveCacheableItem(itemId)
        //      )
        //  ).Value;
    }

    public static void TestLazyEvaluation()
    {
        int[] itemIds = { 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5 };
        ParallelOptions options = new ParallelOptions
        {
            MaxDegreeOfParallelism = 75
        };

        Parallel.ForEach(itemIds, options, itemId =>
        {
            GetCacheableItem(itemId);
            GetCacheableItem(itemId);
            GetCacheableItem(itemId);
            GetCacheableItem(itemId);
            GetCacheableItem(itemId);
        });
    }

    private class CustomLazy<T> : Lazy<T> where T : class
    {
        public CustomLazy(Func<T> valueFactory)
            : base(valueFactory)
        {
            Console.WriteLine("-Lazy Constructor called  ThreadId [{0}]", Thread.CurrentThread.ManagedThreadId);
        }
    }

    private class CacheableItem
    {
        public int ItemId { get; set; }
    }
}

Source: Reed Copsey's Blog

sliderhouserules
  • 3,415
  • 24
  • 32
  • You're right that you don't get multiple instantiations of Lazy, but you can get multiple instantiations of CustomLazy. Given that the second Lazy gets dropped and only one instance of Lazy ever gets its Value property accessed, I don't see how this is better. – bmm6o Jun 10 '15 at 16:40
  • Umm... I'm not sure how you're coming to that conclusion. CustomLazy, in my example, inherits from Lazy, and does nothing more than output a message in the constructor to let you see whether it was called or not. It acts just like Lazy in every other way. Did you actually run my sample code, or are you just making an armchair observation? Because when I ran it, I only saw my messages being output once. – sliderhouserules Jun 12 '15 at 22:40
  • I'm also going to guess that you didn't read the blog entry I linked to? – sliderhouserules Jun 12 '15 at 22:42
  • Sorry, my note wasn't very clear and I muddled up what I was trying to say. Your version is better in that it doesn't always create a Lazy to pass in to GetOrAdd. I just wanted to point out that it is possible to have extra Lazy instances created due to race conditions, see http://pastebin.com/rvDry7ED for code that demonstrates this. – bmm6o Jun 16 '15 at 01:06
  • And while the version of GetOrAdd that takes a callback doesn't contain an explicit allocation like `new Lazy()`, it does pass in an object (of type Func<>) that has to come from somewhere. There's some flexibility with how that ends up getting compiled/JITted, but note that the second callback references itemId and so in most implementations a closure has to be constructed. There are obviously a lot of factors that go into this, but I wouldn't say that your rewrite always produces fewer allocations. – bmm6o Jun 16 '15 at 01:14
  • I made a .NET Fiddle with a slight variant of your code (more output to make point more clear). https://dotnetfiddle.net/lhGYpD The version with lambda does not call the lazy constructor more than once. – PeterVermont Aug 29 '16 at 20:12