2

I have a Dictionary<string, List<MyObject>> and I need to run some resource intensive operations on List<MyObject>. I'm trying to figure out if I can have one thread per key of the Dictionary doing the resource intensive tasks so that each thread updates its key's List. In other words, multiple threads simultaneously updating different items in the dictionary?

Please consider the following simplified pseudo code -

public void MyMethod() {
    //The myDict object needs to be shared by all threads.
    Dictionary<string, List<MyObject>> myDict = new Dictionary<string, List<MyObject>>();
    //GetKeyValue() may return the same key multiple times
    foreach(var kv in GetKeyValue()) { 
        if(myDict.ContainsKey(kv.Key) { myDict[kv.Key].Add(kv.Value); }
        else { myDict.Add(kv.Key, kv.Value); }
        Task.Factory.StartNew(() => { RunSubsetSum(kv.Key, myDict); });
    }
}
//Resource intensive method
public void RunSubsetSum(string key, Dictionary<string, List<MyObject>> myDict)  { 
    //Lock on key so that no two threads run for the same key
    lock(key){
        foreach(var valueToRemove in GetRemovableObjs()) 
            myDict[kv.Key].Remove(valueToRemove);
    }
}

Basically, the idea is that -

  1. No two threads run for the same key at the same time - Will lock(key) make them queue (run sequentially)?
  2. All running threads can independently update the same Dictionary - I expect multiple threads to be able to update different items in the same Dictionary simultaneously.

I tried the above approach but results seem to be inconsistent. I think it is because MyMethod() updates the Dictionary for a key for which RunSubsetSum() is already running but not sure how to lock on the key in MyMethod() without interrupting the loop for other keys. I wonder if C# provides simpler solution to this problem. Any thoughts?

Note: I'm considering creating a Dictionary so that I can keep track of which keys are currently being worked upon and updating MyMethod() to buffer the keys until threads finish, but I want to avoid adding this if I can to avoid overcomplicating the logic.

Achilles
  • 1,099
  • 12
  • 29
  • Why don't you use a `ConcurrentDictionary`? – Tim Schmelter Jan 05 '17 at 15:39
  • @TimSchmelter He doesn't need one (the dictionary is only ever accessed from one thread), and that wouldn't solve the problem he has (of the values in the dictionary being accessed from multiple threads). – Servy Jan 05 '17 at 15:39
  • @Servy: of course, his idea is to have one thread for every key in the dictionary. But my suggestion is to use a `ConcurrentDictionary` to avoid that. He needs to lock the list if he accesses it – Tim Schmelter Jan 05 '17 at 15:40
  • Again, the dictionary doesn't need to be accessed from multiple threads in the first place, a single thread is managing the dictionary and pulling out values that are then used by multiple threads, so there's no need for a `ConcurrentDictionary`, and on top of that, using one *doesn't solve the problem* of multiple values being accessed by different threads. – Servy Jan 05 '17 at 15:42
  • Apologies, the pseudo code doesn't reflect it but the dictionary object is supposed to be shared by all threads. The idea that different threads work off the same dictionary and update their own keys potentially simultaneously and independently. – Achilles Jan 05 '17 at 15:51

3 Answers3

5

You shouldn't ever lock on a string. You're just opening yourself up for a world of hurt, mostly centering around string interning. Each time you use a string literal that is semantically identical to another string literal, they'll have the same reference (unless you turn it off), which means if any of your dictionary keys end up being string literals, then some other code somewhere else in the application domain that has nothing to do with your code could end up locking on the same value. This could end up either resulting in deadlocks, or the two segments waiting at times that they don't actually need to wait.

You should only ever lock on an object that you can be certain only the one type managing the synchronization can ever have access to. Doing this will mean that you can always look at just this one class to analyze the synchronization going on, and that you don't need to worry about what's going on in the rest of the application to determine the synchronization logic of this class.

Fortunately, you already do have an object that corresponds to each key, and that you never expose outside of this class, the List. You don't need to have a separate dictionary of objects to lock on, you can just use the List.

Another problem that you have is that you're fetching the value of the dictionary in your worker thread, which isn't safe, as you're modifying it in another thread, although this is trivially solved by simply fetching the value of the dictionary before starting the new thread, rather than after, and simply passing the string and List into RunSubsetSum rather than the Dictionary.

You're also mutating the List objects from both the worker threads and the main thread, so you'll need to ensure that the caller locks around the list before using it, as well as the workers.

Servy
  • 202,030
  • 26
  • 332
  • 449
  • Thank you. I can get rid of locking on the string key. If I were to pass only the string and List in RunSubsetSum, would it be updating the List object in the Dictionary? Basically, I'm hoping to avoid having RunSubsetSum work on its own copy of List because so many copies could hog the resources even further. That was the motivation for want to run all the threads on the same dictionary object. – Achilles Jan 05 '17 at 16:07
  • @Achilles Yes. `List` is a reference type, so you're passing a copy of the reference, not creating a new list. – Servy Jan 05 '17 at 16:08
  • Sounds great, many thanks. I'll update it to lock on the List instead. How to modify `MyMethod()` though so that it doesn't update a dictionary item for which a thread is already running? If I were to just apply the same lock to `if(myDict.ContainsKey(kv.Key) ....` part, would it solve the problem. It makes sense to keep the `foreach(var kv in GetKeyValue())` loop running, just not let it update an item for which a thread is running. – Achilles Jan 05 '17 at 16:19
  • @Achilles You would need to `lock` around all mutations to the `List`, that's all. – Servy Jan 05 '17 at 16:21
  • Got it. If I apply lock within a loop, does it block the loop or keep the loop going while keeping the locked part in a queue somewhere? – Achilles Jan 05 '17 at 16:25
  • @Achilles Any time you *ever* hit a `lock`, the code will wait until it can access the `lock` and then run it when it can. – Servy Jan 05 '17 at 16:26
  • Ok thank you. Guess I'll have to take some sort of buffering approach so that the loop can still run for other keyvalues for which no threads are running. – Achilles Jan 05 '17 at 16:37
0

lock on string is not going to help sync'ing resource. Since string is immutable object. Everytime, string is passed to the function will lead to new string. More importantly both methods are touching the dictionary at the same time. Would advise to combine them both and call from however your want.

private readonly object _padLock = new object();
public void CallingMethod() {
   Task.Factory.StartNew(() => { MyMethod(); })
}

public void MyMethod() {
   lock (_padLock)
   {
      //The myDict object needs to be shared by all threads.
      Dictionary<string, List<MyObject>> myDict = new Dictionary<string, List<MyObject>>();
      //GetKeyValue() may return the same key multiple times
      foreach(var kv in GetKeyValue()) { 
        if(myDict.ContainsKey(kv.Key) { myDict[kv.Key].Add(kv.Value); }
        else { myDict.Add(kv.Key, kv.Value); }
        RunSubsetSum(kv.Key, myDict);
      }
   }
}
//Resource intensive method
public void RunSubsetSum(string key, Dictionary<string, List<MyObject>> myDict)  { 
    //Lock on key so that no two threads run for the same key
        foreach(var valueToRemove in GetRemovableObjs()) 
            myDict[kv.Key].Remove(valueToRemove);
}
Cinchoo
  • 6,088
  • 2
  • 19
  • 34
  • Yes, `string` is immutable, but no, passing a `string` to a function does not create a new string. It passes a copy of the reference to the string, so it will have the same reference. There are other problems with locking on a `string`, as I address in my answer, but they're completely different. As for the rest of your changes, you've changed the code from processing each value in its own thread, as per the stated requirements, to simply synchronizing the whole thing and never using any additional threads at all, which simply isn't what was asked for. – Servy Jan 05 '17 at 16:10
-1

Servy is correct about locking on the string. The simply solution is to create a private field just for locking:

object LockMe = new object();

public void SomeMethod()
{
    lock(LockMe)
    {
        <... do something here ...>
    }
}

The other issue is to keep in mind that there is a maximum number of allowed threads per application so, if you are creating a thread for each key in the Dictionary, you run the risk of reaching that maximum thread count.

You might want to rethink your threading model.

Mike U
  • 709
  • 6
  • 11
  • Thank you. The reason behind locking on the string key was to ensure that no two threads work on the same dictionary item. If we lock on a private field, it would lock the entire dictionary up, right? So, no more processing multiple dictionary items in parallel? – Achilles Jan 05 '17 at 16:09
  • @Achilles Yes, unless you had an object for each key value, as you mentioned in the question. – Servy Jan 05 '17 at 16:11
  • Ah, yes. You are correct. I wasn't think about that. You would need a separate object instance for each key. – Mike U Jan 05 '17 at 22:07