0

I have three separate codes running in separate threads.

Thread task 1: Reading data from a device and writing it into a ConcurrentDictionary.

Thread task 2: Writes the data in the ConcurrentDictionary to the computer as a separate file.

I have read many posts on the forum saying that concurrentdictionary is safe for separate threads. I've also read that there are lockout situations. In fact, more than one question mark occurred in my head.

Is there a need for locking for the concurrentdictionary? In which cases locking is required ? How can I do if locking is required? What problems does use in the following way cause?

Thread code 1: Data comes in every second.

public void FillModuleBuffer(byte[] buffer, string IpPort)
{
     if (!CommunicationDictionary.dataLogList.ContainsKey(IpPort))
     {
         CommunicationDictionary.dataLogList.TryAdd(IpPort, buffer);
     }
}

Thread code 2: The code below works in a timer. The timer duration is 200 ms.

if (CommunicationDictionary.dataLogList.ContainsKey(IpPort))
        {
          using (stream = File.Open(LogFilename, FileMode.Append, FileAccess.Write))
          {
              using (BinaryWriter writer = new BinaryWriter(stream))
              {
                 writer.Write(CommunicationDictionary.dataLogList[IpPort]);
                 writer.Flush();
                 writer.Close();
                 CommunicationDictionary.dataLogList.TryRemove(IpPort,out _);
               }
          }

}

Note: the codes have been simplified for clarity.

Note 2: I used Dictionary before that. I encountered a very different problem. While active, after 2-3 hours, I got the error that the array was out of index range even though there was no data in the Dictionary.

Eriawan Kusumawardhono
  • 4,796
  • 4
  • 46
  • 49
Ozgur Saklanmaz
  • 528
  • 3
  • 17
  • 4
    `Thread code 1:` Remove the `ContainsKey` check. It does nothing useful. – mjwills Feb 23 '21 at 06:41
  • 4
    `Thread code 2:` Replace the `ContainsKey` check with `TryGetValue`. Or, even better, consider removing _both_ and using `TryRemove` _only_. Note if you do this you will need to consider what happens if you remove the item from the dictionary and the stream writing fails. – mjwills Feb 23 '21 at 06:41
  • From the [docs](https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentdictionary-2?view=net-5.0): All these operations are atomic and are thread-safe with regards to all other operations on the ConcurrentDictionary class. The only exceptions are the methods that accept a delegate, that is, AddOrUpdate and GetOrAdd. – SomeBody Feb 23 '21 at 06:41
  • 2
    @SomeBody That may be true - but the OP's code is not 100% safe (e.g. if a third thread removes items from the concurrent dictionary then thread code 2 will eventually fail). – mjwills Feb 23 '21 at 06:42
  • @mjwills: I assumed that there really only the two threads that OP presented to us. In that case, the code looks safe to me, but I agree that it could be improved with your suggestions. – SomeBody Feb 23 '21 at 06:52
  • 1
    If there is a way that will definitely be safe vs one that _might_ be safe I'd generally encourage the "definitely safe" approach. Since code changes over time. – mjwills Feb 23 '21 at 06:53
  • @mjwills measuring the thread-safety as a percentage doesn't make much sense. A piece of code is either thread-safe, or it's not. It this specific case it's not. (regarding this phrase: *"the OP's code is not 100% safe"*) – Theodor Zoulias Feb 23 '21 at 06:54
  • @TheodorZoulias I am not sure where I said what you are claiming I said. If you are referring to the "100%" I believe you are making the exact same point I am. I was just saying it in a slightly more tactful way. :) – mjwills Feb 23 '21 at 06:56
  • I uninstalled ContainsKey in thread 1. I added a pre-security since I used Dictionary before. I guess the ConcurrentDictionary provides this security automatically. – Ozgur Saklanmaz Feb 23 '21 at 07:10
  • Thank you. I fixed the code in thread 2 with TryGetValue. Unfortunately, I do not fully understand what you mean by better. I would be glad if you explain. Is there any other point I should add in terms of security? Thank you for your answers to your suggestions. – Ozgur Saklanmaz Feb 23 '21 at 07:13
  • Better as in faster, primarily. Your original code walks into the shop to check the eggs are there. Then it walks in again to look at the eggs. Then it walks in again to remove them from the shelf. But `TryRemove` _does all three at once._ – mjwills Feb 23 '21 at 07:33
  • `I got the error that the array was out of index range even though there was no data in the Dictionary.` This is because `Dictionary` is not thread-safe. – mjwills Feb 23 '21 at 07:34
  • 1
    Saklanmaz you may be interested to read my opinion regarding [When should I use ConcurrentDictionary and Dictionary?](https://stackoverflow.com/questions/42013226/when-should-i-use-concurrentdictionary-and-dictionary/63940194#63940194) In short the `ConcurrentDictionary` can be seen as a performance optimization of a normal `Dictionary` protected by a `lock`. But what you gain in performance, you lose in flexibility. The repertoire of what a `ConcurrentDictionary` can do atomically with thread-safety is limited. – Theodor Zoulias Feb 23 '21 at 07:43
  • Thank you. I made the necessary corrections. – Ozgur Saklanmaz Feb 23 '21 at 07:43
  • Thank you @TheodorZoulias . Some question marks in my head are gone. I have never used PLINQ before and I don't know. I can research and try on it. I'm new and I learn some things over time. :) – Ozgur Saklanmaz Feb 23 '21 at 07:50
  • 1
    @TheodorZoulias its a common misconception that a dictionary with a lock is slower than a concurrent dictionary, it really depends exactly on how its used. a ConcurrentDictionary is optimized for lock-free reads, and scalable for writes. Adds both allocate and lock not to mention all the other side effects of different methods – TheGeneral Feb 23 '21 at 07:50
  • @00110001 a `Dictionary` protected by a lock can create significant contention is case multiple threads are trying to access it concurrently in tight loops. The `ConcurrentDictionary` is intended for these cases, where you expect the total number of dictionary operations per second to exceed the one million. – Theodor Zoulias Feb 23 '21 at 07:57
  • 1
    @TheodorZoulias yes I understand how the CLR/CLI manages hybrid locking with the syncblock header, spin waits, and kernel events, and also know exactly how ConcurrentDictionary works, and it really comes down to the use case, so yes you are right (partially).. Though it is fun to note, In a lot of situations concurrent style structures will take a large performance hit and should be benchmarked on hot paths because they can *and will* burn cycles and allocate a lot more than the alternative when used outside their optimized intended use cases. Anyway, i am guess you know this as well :) – TheGeneral Feb 23 '21 at 08:13
  • @00110001 agreed. That's why in general I would advise: start with a normal `Dictionary`+`lock`, and after experiencing degraded performance only then look at the `ConcurrentDictionary` as a possible solution to the performance problems. That said, a multithreading expert should be able to predict the appearance of these problems beforehand in most cases, and skip the first step. – Theodor Zoulias Feb 23 '21 at 08:20
  • I wouldn't agree generally, since it is _easier_ to shoot yourself in the foot with a `Dictionary` than a `ConcurrentDictionary`. But it is debatable. – mjwills Feb 23 '21 at 08:39
  • 1
    @mjwills my feet have lots of holes over the years ;) – TheGeneral Feb 23 '21 at 08:40
  • TO be honest, I wouldn't suggest a `Dictionary` _or_ a `ConcurrentDictionary` here. My gut feel is that OP really needs a `ConcurrentQueue`. – mjwills Feb 23 '21 at 08:49
  • @mjwills agreed. The phrase *"The code below works in a timer. The timer duration is 200 ms."* indicates that the OP has implemented a solution without having a complete knowledge of the available tools. A `BlockingCollection` could be an even better option than a `ConcurrentQueue`, and a TPL Dataflow [`ActionBlock`](https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.dataflow.actionblock-1) could probably be the best of all options. – Theodor Zoulias Feb 23 '21 at 08:56
  • I tried using ConcurrentQueue. But I need any kind of IpPort containing key. Because the records received during recording are processed according to this IpPort key. I have to store the data of each address separately when there is more than one connection while reading the data. For this reason, I preferred ConcurrentDictionary. – Ozgur Saklanmaz Feb 23 '21 at 08:59
  • @00110001 FYI I just tried to swap a `Dictionary` with a `ConcurrentDictionary` in an implementation of a `KeyedLock` [here](https://stackoverflow.com/questions/31138179/asynchronous-locking-based-on-a-key/65256155#65256155), and the resulting performance matched your [previous comment](https://stackoverflow.com/questions/66328111/is-concurrentdictionary-safe-to-use#comment117263886_66328111). My (non published) implementation depends heavily on the `AddOrUpdate`, `TryUpdate` and `TryRemove` methods, and is ~20% slower than a lock-protected `Dictionary`, and a lot more allocatey. – Theodor Zoulias Feb 23 '21 at 17:00
  • 1
    @TheodorZoulias ahh ok yeah. the structures are great and easy to use, and can sometimes be faster, however yeah they kind of need to be benchmarked on hot paths. Anyway nice work the class by the way – TheGeneral Feb 23 '21 at 22:56

1 Answers1

1

The example code should be kind of threadsafe, but it shows a missunderstanding on how the concurrent dictionary should be used. For example:

 if (!CommunicationDictionary.dataLogList.ContainsKey(IpPort))
 {
     CommunicationDictionary.dataLogList.TryAdd(IpPort, buffer);
 }

This happens to work because there is only one thread that adds to the dictionary, but since there are separate statements the dictionary may change between them. If you look at the documentation for TryAdd you can see that it will return false if the key is already present. So no need for the ContainsKey. There are quite a few different methods with the purpose of doing multiple things at the same time, to ensure the entire operation is atomic.

Same with the reading thread. All accesses to the concurrentDictionary should be replaced with one call to TryRemove

if (CommunicationDictionary.dataLogList.TryRemove(IpPort,out var data))
    {
      using (stream = File.Open(LogFilename, FileMode.Append, FileAccess.Write))
      {
          using (BinaryWriter writer = new BinaryWriter(stream))
          {
             writer.Write(data);
             writer.Flush();
             writer.Close();
           }
      }
}

Note that this will save some datachunks, and throwaway others, without any hard guarantee what chunks will be saved or not. This might be the intended behavior, but it would be more common with a queue that ensures that all data is saved. A typical implementation would wrap a concurrentQueue in a blockingCollection with one or more producing threads, and one consuming thread. This avoids the need for a separate timer.

JonasH
  • 28,608
  • 2
  • 10
  • 23
  • First of all thanks. As I understand it, you suggest concurrentQueue instead of writing and deleting data to a buffer [] array. Please correct if I got it wrong. – Ozgur Saklanmaz Feb 23 '21 at 10:16
  • When I use TryRemove, it will still delete the data in the dataLogList if the file is busy and unable to write. Otherwise, if the file is busy and unable to write, it will throw an exception and try to rewrite depending on the timer's duration. With my research there is unfortunately no way to check if the file is busy. I got the using block on TryRemove. I think the problem has been eliminated in this way. – Ozgur Saklanmaz Feb 23 '21 at 10:35
  • 1
    @Saklanmaz That depend on what you want. If you are writing a logging system I assume you would want all data to be logged, and then your current approach will just not work, there are more fundamental problems than thread safety here. If at all possible, use an existing logging system that already works. Also, writing to the same file from multiple threads is probably not a good idea, you could perhaps try to re-add the data to the dictionary, but I would recommend rethinking your design. – JonasH Feb 23 '21 at 10:50
  • I think I got it wrong. I am not trying to write to file from multiple threads. a thread checks connections and writes the incoming data into a ConcurrentDictionary. The other thread is trying to write data written into the ConcurrentDictionary, to the file. – Ozgur Saklanmaz Feb 23 '21 at 11:03
  • @saklanmaz if you fail to write to the file because it is busy, then there is presumably some other thread or process that has a write lock on the file. – JonasH Feb 23 '21 at 11:45
  • I understood what you mean. Sometimes he writes the file in more than 200 ms, depending on the speed of the computer. Therefore the file appears to be busy. I'm not sure how to fix this problem. 400 ms is enough time for me but 400 ms feels like a very long time under normal circumstances. – Ozgur Saklanmaz Feb 23 '21 at 13:13
  • @Saklanmaz if the writes take to long you should some kind of synchronization to ensure only one write is run at a time. A Times.Timer have a SynchronizationObject for this, but there are other ways to do it. – JonasH Feb 23 '21 at 15:05
  • Thanks for your help. i fixed my code. I save the incoming data in a concurrentQueue. If the writing process is fast, I cannot close the file, increasing the time and writing the data in the concurrentQueue in 30 seconds and deleting the data in the queue. – Ozgur Saklanmaz Feb 24 '21 at 06:41