5

Suppose you have the following code

var dictionary = new Dictionary<int, object>(capacity: 2500);

var uniqueKeys = Enumerable.Range(0, 1000).ToArray();

Parallel.ForEach(uniqueKeys, key => dictionary[key] = new object());

Note that all keys are unique and that the dictionary capacity far exceeds the count of keys.

Question: Are there any conditions that would result in this code not succeeding?

Given the current implementation of Dictionary<,> and not postulating about future hypothetical internal changes can you show any proof of unsafe access?


Remarks: this is not a duplicate of Thread safety with Dictionary<int,int> in .Net or Thread safety of a Dictionary<TKey, TValue> and I do not need anyone to tell me about ConcurrentDictionary, etc.

Community
  • 1
  • 1
Chris Marisic
  • 32,487
  • 24
  • 164
  • 258
  • Instead you should do `uniqueKeys.AsParallel().ToDictionary(key => key, key => new object())` – Aron Nov 10 '15 at 18:16
  • @Aron my real world scenario is somewhat comparable to that, merging data first before indexing it. I'm trying to come up with a reason that i can't skip that step and go directly into the dictionary. [millions of objects] – Chris Marisic Nov 10 '15 at 19:19

1 Answers1

4

First of all, I want to note that I got similar situation in recent project, where we have a dictionary with keys a DateTimes (unique), dealing with it in parallel, and after initializing it we sometimes got issues with KeyNotFoundException, but we didn't preallocate the memory like you. May be the issues are solved with it? Let's talk about code you've linked.

My teacher for multi-threading programming always tells us the very same thing each time we got a question regarding the concurrency:

What if billions of threads will be here right in this moment?

So let's try to see is there a possible problem in the Dictionary.
dictionary[key] = new object() leads us to

set {
    Insert(key, value, false);
}

Insert is the main adding method, being called from many places in the Dictionary class. As you state the objects are unique, I assume that there will be no hash collisions there, and no overriding the values in first loop of methods, so let's see the rest of the code:

int index;
if (freeCount > 0) {
    index = freeList;
    freeList = entries[index].next;
    freeCount--;
}
else {
    if (count == entries.Length)
    {
        Resize();
        targetBucket = hashCode % buckets.Length;
    }
    index = count;
    count++;
}

As you initialized the dictionary with capacity 2500, else clause seems not to called at all during such situation, so let's examine if part: 1. if (freeCount > 0) { 2. // atomic assign 3. index = freeList; 4. // some calculation and atomic assign 5. freeList = entries[index].next; 6. // not threadsafe operation 7. freeCount--; 8. }

Seems like we have multiple multi-threading issues here:

  1. freeList and freeCount fields aren't volatile, so read/write to it can be error-prone.
  2. What if billions of threads will be here right in this moment?: ©
    3. index = freeList;
    Billion of threads will get the same index, as there is no synchronization between read and write for the freeList field! And after that they'll override the value for each other with a race condition:

    entries[index].hashCode = hashCode;
    entries[index].next = buckets[targetBucket];
    entries[index].key = key;
    entries[index].value = value;
    buckets[targetBucket] = index;
    version++;
    
  3. decrement operation isn't thread-safe (great interesting answer from @EricLippert), so we'll have potentially corrupted state for the dictionary.
  4. What if billions of threads will be here right in this moment?: ©
    5. freeList = entries[index].next;
    Some thread gets the freeList value into index variable, say we have 5 there (the entries contains the linked list inside them, with head equal to -1) and became inactive before overriding the freeList. Billion of threads advances the linked list position after position, and now our first one became active. He happily overrides the freeList with obsolete data, creating a ghosts in linked list, and data there can be overridden.

So there is a lot of issues can happen during your code execution, and personally I don't recommend you to use Dictionary class in such circumstances.

Community
  • 1
  • 1
VMAtm
  • 27,943
  • 17
  • 79
  • 125
  • And this is a great reason why fields should be underscore prefixed, I was not accounting for freeList being a field opposed to a local variable. freeCount tearing would have the unfortunate side effect of not being able to trust dict.Count which could be tolerated. However the risk of freeList reusing index is the real problem as this would allow data loss. – Chris Marisic Nov 11 '15 at 16:37
  • @ChrisMarisic Yep, that's blew up my mind during review. – VMAtm Nov 12 '15 at 07:32