1

I have a code base in which multiple threads are writing in a ConcurrentDictionary and every 60 seconds another thread runs and clones the main CD, clears it, and continues its work on the cloned CD. I want to know am I going to miss some data if I don't use lock while Cloning and Clearing the main CD? The code to demonstrate the problem is like the following:


class Program
    {
        static object lock_obj = new object();

        static async Task Main(string[] args)
        {
            ConcurrentDictionary<string, ThreadSafeLong> cd = new ConcurrentDictionary<string, ThreadSafeLong>();


            Func<Task> addData = () =>
            {
                return Task.Run(async () =>
               {
                   var counter = 1;

                   while (true)
                   {
                       lock (lock_obj)
                       {
                           for (int i = 0; i < 100_000; i++)
                           {
                               cd.TryAdd($"{counter}:{i}", new ThreadSafeLong(i));
                               //WriteLine(i);
                           }
                           WriteLine($"Round {counter}");
                       }
                       counter++;
                       await Task.Delay(1_000);
                   }
               });
            };

            Func<Task> writeData = () =>
            {
                return Task.Run(async () =>
              {
                  while (true)
                  {
                      var sw = Stopwatch.StartNew();

                      lock (lock_obj) // to clone the data, and prevent any other data to be added while clone
                      {
                          var cloned = new ConcurrentDictionary<string, ThreadSafeLong>(cd);
                          cd.Clear();
                          WriteLine($"Cloned Count: {cloned.Count}");
                      }

                      sw.Stop();
                      WriteLine($"Elapsed Time: {sw.ElapsedMilliseconds}");

                      await Task.Delay(6_000);
                  }
              });
            };

            await Task.WhenAll(addData(), writeData());

        }
    }

PS: Somehow might be related to the question here

Saeed Ganji
  • 197
  • 17
  • So you have multiple producers and a single consumer, but the consumer receives its work in batches. What's the benefit/requirement of using a *dictionary* specifically to hold the queued up request batches? – Damien_The_Unbeliever Apr 16 '19 at 07:47
  • I keep an aggregated value per key, I don't want every hit to be an individual input, so I used Dictionaries and I increase the value every time. – Saeed Ganji Apr 16 '19 at 07:49
  • 2
    Uncontested locks are cheap. So long as you don't hold the lock for too long, it's fine. Work at reducing the time you need to hold the lock for, rather than trying to remove it altogether. – canton7 Apr 16 '19 at 08:19
  • Why are you using a `ConcurrentDictionary`? Since you are locking before every read and write, a simple `Dictionary` should be enough. – Theodor Zoulias Apr 16 '19 at 09:02
  • @TheodorZoulias, you are right, the original question was how to clone a `ConcurrentDictionary`, but after we have gone through my sample, there is no need any more. – Saeed Ganji Apr 16 '19 at 10:38

1 Answers1

4

In these cases I would replace the dictionary with a new one instead of calling clear:

lock (lock_obj)
{
    var cloned = cd;
    cd = new ConcurrentDictionary<string, ThreadSafeLong>();
}

In that case the other threads are finish their write into the old one or already working with the new one.

Oliver
  • 43,366
  • 8
  • 94
  • 151
  • 1
    without any locking? – Saeed Ganji Apr 16 '19 at 08:00
  • I changed the sample to what you said and it seems that some times we have one Item missed !!! – Saeed Ganji Apr 16 '19 at 08:09
  • 1
    @SaeedGanji You need locking. Otherwise the writing thread might be writing to the cloned copy for a period, even after you've switched in a new one. – canton7 Apr 16 '19 at 08:18
  • @canton7, you mean I should use Oliver's approach instead of mine, but keep the clone/reassign operations in a lock, right? – Saeed Ganji Apr 16 '19 at 08:20
  • 3
    @SaeedGanji Oliver's approach is going to be cheaper than copying every item in one dictionary into another. It doesn't change any of the locking needs however - the risk is that writers keep writing to the clone, even after you've created your new empty ConcurrentDictionary. If that's something you need to avoid, you need locking to make sure that all writers have stopped writing before you make the clone. You could use a ReaderWriterLockSlim to do this - threads attempting to write to the dictionary get a read lock, and the thread cloning gets a write lock – canton7 Apr 16 '19 at 08:23
  • @canton7 if they write to the cloned one I would have no problem since I won't miss any data that I require to process. They will, however, write into the main CD after reassignment/new right? I just could not recognize why I lose one item some times, and only one.( One item is missing during the clone/reassign process ) – Saeed Ganji Apr 16 '19 at 08:29
  • 2
    @SaeedGanji the problem is if they write to the cloned one *after* you've processed it! When you write to a field without a lock, only the thread which did the write notices straight away. Other threads might take a while to notice. The lock adds a memory barrier, which avoids this, so long as both the readers and writing acquire the lock. Writing threads will also need to lock while writing to the CD, to avoid the clone being created after a writer thread reads the CD field, but before it writes to the CD. – canton7 Apr 16 '19 at 08:33