1

I'm trying to build a model where there will me multiple reads of an entire collection and rare additions and modifications to it.

I thought I might use the ConcurrentBag in .NET as I've read the documentation and it's supposed to be good for concurrent reads and writes.

The code would look like this:

public class Cache 
{     
   ConcurrentBag<string> cache = new ConcurrentBag<string>();

   // this method gets called frequently
   public IEnumerable<string> GetAllEntries()
   { 
         return cache.ToList();
   }

   // this method gets rarely called 
   public void Add(string newEntry) 
   {  
         // add to concurrentBag
   }

   public void Remove(string entryToRemove)
   {
        // remove from concurrent bag
   } 
} 

However, I've decompiled the ConcurrentBag class and on theGetEnumerator there's always a lock taken, which means any call to GetAllEntries will lock the entire collection and it will not perform.

I'm thinking to get around this and code it in this manner instead, using a list.

 public class Cache 
 {     
    private object guard = new object();

    IList<string> cache = new List<string>();

   // this method gets called frequently
   public IEnumerable<string> GetAllEntries()
   { 
     var currentCache = cache; 
     return currentCache;
   }

   // this method gets rarely called 
   public void Add(string newEntry) 
   {  
       lock (guard) 
       {
            cache.Add(newEntry); 
       }
   }

   public void Remove(string entryToRemove)
   {
      lock (guard) 
      {
           cache.Remove(entryToRemove);
      }
   } 
} 

Since the Add and Remove are rarely called I don't care too much about locking the access to the list there. On Get I might get a stale version of the list, but again I don't care, it will be fine for the next request.

Is the second implementation a good way to go?

EDIT

I've run a quick performance test and the results are the following:

Setup: populated the in memory collection with 10000 strings.

Action: GetAllEntries concurrently 50000 times.

Result: 00:00:35.2393871 to finish operation using ConcurrentBag (first implementation) 00:00:00.0036959 to finish operation using normal list (second implementation)

Code below:

 class Program
 {
    static void Main(string[] args)
    {
        // warmup caches and stopwatch
        var cacheWitBag = new CacheWithBag();
        var cacheWitList = new CacheWithList();

        cacheWitBag.Add("abc");
        cacheWitBag.GetAllEntries();

        cacheWitList.Add("abc");
        cacheWitList.GetAllEntries();


        var sw = new Stopwatch();
        // warmup stowtach as well
        sw.Start();

        // initialize caches (rare writes so no real reason to measure here
        for (int i =0; i < 50000; i++)
        {
            cacheWitBag.Add(new Guid().ToString());
            cacheWitList.Add(new Guid().ToString());
        }
        sw.Stop();

        // measure
        var program = new Program();

        sw.Start();
        program.Run(cacheWitBag).Wait();
        sw.Stop();
        Console.WriteLine(sw.Elapsed);

        sw.Restart();
        program.Run2(cacheWitList).Wait();
        sw.Stop();
        Console.WriteLine(sw.Elapsed);
    }

    public async Task Run(CacheWithBag cache1)
    {
        List<Task> tasks = new List<Task>();
        for (int i = 0; i < 10000; i++)
        {
            tasks.Add(Task.Run(() => cache1.GetAllEntries()));
        }

        await Task.WhenAll(tasks);
    }
    public async Task Run2(CacheWithList cache)
    {
        List<Task> tasks = new List<Task>();
        for (int i = 0; i < 10000; i++)
        {
            tasks.Add(Task.Run(() => cache.GetAllEntries()));
        }

        await Task.WhenAll(tasks);
    }

    public class CacheWithBag
    {
        ConcurrentBag<string> cache = new ConcurrentBag<string>();

        // this method gets called frequently
        public IEnumerable<string> GetAllEntries()
        {
            return cache.ToList();
        }

        // this method gets rarely called 
        public void Add(string newEntry)
        {
            cache.Add(newEntry);
        }
    }


    public class CacheWithList
    {
        private object guard = new object();

        IList<string> cache = new List<string>();

        // this method gets called frequently
        public IEnumerable<string> GetAllEntries()
        {
            var currentCache = cache;
            return currentCache;
        }

        // this method gets rarely called 
        public void Add(string newEntry)
        {
            lock (guard)
            {
                cache.Add(newEntry);
            }
        }

        public void Remove(string entryToRemove)
        {
            lock (guard)
            {
                cache.Remove(entryToRemove);
            }
        }
    }

}

}

Dan Dinu
  • 32,492
  • 24
  • 78
  • 114
  • 1
    Can you explain `var currentCache = cache; return currentCache;`? Are you expect it any different from just `return cache;`? Do you consider of using some reader-writer lock instead. – user4003407 Jan 27 '18 at 19:15
  • yeah, actually I could just `return currentCache` should be the same. I know that any kind of locking will decrease the performance of the code in async/await paradigm so I'm trying to have lock free code. – Dan Dinu Jan 27 '18 at 19:20
  • 2
    Well if the rare operation is adding or removing, I’d simply use an immutable collection and update the cache reference whenever something is added or removed. – InBetween Jan 27 '18 at 19:20
  • 1
    Your code is absolutely not correct. Please check out https://stackoverflow.com/questions/18066429/shallow-copy-or-deep-copy (or any other "assignment vs. shallow copy vs. deep copy") and think about what would happen when you are iterating over items and modification comes at the same time. – Alexei Levenkov Jan 27 '18 at 19:52
  • Making a copy of a collection is the sledge-hammer solution to making iteration thread-safe. It trades holding the lock for a short time, and thus making adding/removing quick, for the expense of the copy. The opposite of what you said you wanted. And you have to reason through a possible failure mode, the copy contains elements that might have been removed. In other words, it is instantly stale, in itself a threading race. Re-inventing the MemoryCache class is not a great idea, it was made for this. – Hans Passant Jan 27 '18 at 20:28
  • 1
    `ConcurrentBag` is not designed for concurrent enumeration. Enumerating a `ConcurrentBag` actually locks and copies the entire collection. Its intended use is for parallel computation that inserts into the bag and at the end *after the parallel work* grabs all the results. – Cory Nelson Jan 27 '18 at 21:31

3 Answers3

5

To improve on InBetween's solution:

class Cache 
{
   ImmutableHashSet<string> cache = ImmutableHashSet.Create<string>();

   public IEnumerable<string> GetAllEntries()
   {   
       return cache;
   }

   public void Add(string newEntry) 
   {
       ImmutableInterlocked.Update(ref cache, (set,item) => set.Add(item), newEntry);
   }

   public void Remove(string entryToRemove)
   {
       ImmutableInterlocked.Update(ref cache, (set,item) => set.Remove(item), newEntry);
   }
}

This performs only atomic operations (no locking) and uses the .NET Immutable types.

Cory Nelson
  • 29,236
  • 5
  • 72
  • 110
3

In your current scenario, where Add and Remove are rarely called, I'd consider the following approach:

public class Cache 
{     
    private object guard = new object();
    var cache = new SomeImmutableCollection<string>();

   // this method gets called frequently
   public IEnumerable<string> GetAllEntries()
   {   
       return cache;
   }

   // this method gets rarely called 
   public void Add(string newEntry) 
   {  
       lock (guard) 
       {
           cache = cache.Add(newEntry); 
       }
   }

   public void Remove(string entryToRemove)
   {
      lock (guard) 
      {
          cache = cache.Remove(entryToRemove);
      }
   } 
}

The fundamental change here is that cache now is an immutable collection, which means it can't change....ever. So concurrency problems with the collection itself simply disappear, something that can't change is inherently thread safe.

Also, depending on how rare calls to Add and Remove are you can even consider removing the lock in both of them because all its doing now is avoiding a race between Add and Remove and a potential loss of a cache update. If that scenario is very very improbable you could get away with it. That said, I very much doubt the few nanoseconds an uncontended lock takes is a relevant factor here to actually consider this ;)

SomeImmutableCollection can be any of the collections found in System.Collections.Immutable that better suit your needs.

InBetween
  • 32,319
  • 3
  • 50
  • 90
1

Instead of a 'lock' on a guard object to protect a simple container you should consider the 'ReaderWriterLockSlim' which is optimized and very performant for the read/write scenario : multiple readers are allowed at same time but only one writer is allowed and blocks other readers/writers. It is very useful in your scenario where you read a lot but write only few.

Please note you can be a reader and then, for some reason, decide to become a writer (upgrade the slim lock) in your "reading" code.

Antonio
  • 21
  • 2