0

We're on a high traffic website where we want to do some very simple caching for a calculation using a ConcurrentDictionary, to prevent it from being done for every request. The number of possible inputs is limited enough and the calculation relatively heavy (splitting and rejoining a string).

I'm considering code like

string result;
if (!MyConCurrentDictionary.TryGetValue(someKey, out result))
{
    result = DoCalculation(someKey);
    // alternative 1: use Item property
    MyConcurrentDictionary[someKey] = result;
    //alternative 2: use TryAdd
    MyConcurrentDictionary.TryAdd(someKey, result);
}

My question is: which alternative is the best choice from a performance perspective?

Henk Kok
  • 353
  • 6
  • 10
  • 2
    [Which is faster?](https://ericlippert.com/2012/12/17/performance-rant/) – René Vogt Aug 02 '16 at 16:49
  • Can you give some samples of the strings? I think the best way is to parse on lines. You probably are getting a stream where the data you are parsing may be in two consecutive request. So you will be combining data between the two requests. You may want to search for parsing algorithms of streams to find best. – jdweng Aug 02 '16 at 16:55
  • @jdweng: we're manipulating the request path (which is not open for discussion) for which we split the path, add, change and/or remove elements and then join the elements back to a string. In any case, we think that a hasing plus key lookup are faster than the calculation. – Henk Kok Aug 03 '16 at 08:05
  • @EliArbel: can you explain? Both methods will add the key with value if it isn't present and will not generate an error if it does, which is what we are aiming for. – Henk Kok Aug 03 '16 at 08:06
  • @RenéVogt: oops, sorry, deleted my not so smart comment – Henk Kok Aug 03 '16 at 08:09
  • Do you expect duplicate keys? – jdweng Aug 03 '16 at 09:14
  • @EliArbel: ok, clear, we're aware of that. The question remains: is it faster to use TryAdd, which we assume must either do some (optional?) locking around a "check if key exists, if not add" code block, or do exception handling for duplicate keys, or use the Item property setter, which will use locking in som eway too. – Henk Kok Aug 03 '16 at 10:02
  • If it's "relatively heavy", then the cost of adding should be negligible. Second, your code doesn't prevent 50 threads from calling `DoCalculation` concurrently. And I would not really call "splitting and rejoining a string" a "relatively heavy" operation; the only way you will benefit from caching is to avoid multiple allocation. – vgru Aug 03 '16 at 11:08

2 Answers2

8

Your code is completely broken; you're assuming that nothing will change between those two lines.

You need to use .GetOrAdd().

In general, when dealing with a mutable concurrent object, you must never make multiple calls to the object for the same thing, since its state can change at any time.

SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964
  • Based on the documentation `GetOrAdd` isn't blocking. That is if you call it for a key that is not present on two threads at the same time the `Func` will be called for both instead of it blocking one while the other runs. I assume that's the behavior you were expecting it to have. – juharr Aug 02 '16 at 16:56
  • We're fully aware that the ConcurrentDictionary can (and given the load, is likely to) change between the TryGetValue call and the two alternative calls to add the key if TryGetValue returned false. We think both alternative calls handle the case that the key was added while the DoCalculation was done logically correct. The question is: which method is the most performant and the least blocking? GetOrAdd requires the calculation to be executed everytime, which is what we are trying to prevent in the first place. – Henk Kok Aug 03 '16 at 07:59
  • @HenkKok: but the point is, why would you possibly want to allow this, when concurrent collections were designed specifically to prevent this? Calling `TryGetValue` and then setting the value in succession is certainly not faster than one call to `GetOrAdd`, in addition to not being atomic. – vgru Aug 03 '16 at 10:56
  • @juharr: while it's possible for the delegate to get called twice, the value returned by `GetOrAdd` will be the same in both cases. There aren't any built in mechanisms to prevent this apart from doing standard double-checked locking yourself. – vgru Aug 03 '16 at 10:57
  • @Groo: a GetOrAdd call replacing the whole if block would require unconditial execution of the calculation (to get the value to add if not gotten). This was the whole point of the Dictionary. By the way, GetOrAdd is not atomic as well: see https://msdn.microsoft.com/en-us/library/dd997369(v=vs.110).aspx (remarks below code) – Henk Kok Aug 03 '16 at 12:15
  • 3
    @HenkKok: what do you mean by "unconditional execution"? The delegate is only evaluated if it's not already inside the dictionary, just like you are doing right now. We are talking about the [overload which accepts a `Func` delegate](https://msdn.microsoft.com/en-us/library/ee378677(v=vs.110).aspx). Regarding atomicity, it is not guaranteed that the delegate won't be called multiple times (just like in your code), but it guarantees that the return value will always be the same instance for all threads. – vgru Aug 03 '16 at 12:21
  • Yes, the [MSDN example](https://msdn.microsoft.com/en-us/library/dd997369(v=vs.110).aspx) is ridiculous, I agree, as is often the case with their examples (probably the give this job to interns who don't really understand what they are doing). It creates the value on each call to `GetOrAdd`, instead of passing a lazy delegate. In your case, it likely doesn't even matter if you get two different `string` instances from your method (presuming `DoCalculation` always returns the same result given the same input), but generally you want the dictionary to return the same key to everyone. – vgru Aug 03 '16 at 12:30
  • @Groo: ok, I first only noticed the GetOrAdd(TKey key, TValue value) overload, but after seeing the GetOrAdd(TKey key, Func valueFactory), I agree that this suits my requirements perfectly. The race condition is still present in that case, but not an issue, as the calculation is deterministic: same input (key) will always have the same result (value). – Henk Kok Aug 03 '16 at 12:56
4

As @SLaks indicated, your code has a race condition. ConcurrentDictionary was built to prevent such scenarios by providing methods that perform complex operations atomically, such as GetOrAdd and AddOrUpdate.

Here's a sequence that describes how your code could break:

  • Thread 1 executes TryGetValue which returns false
  • Thread 2 does the same
  • Both threads do the calculation
  • Thread 2 adds the value to the dictionary and returns the result
  • Thread 1 either
    • (using the indexer setter) adds another value, overwriting the previous one and returns it
    • (using TryAdd) doesn't add it and the method returns a result that's potentially different than the one in the dictionary
  • Thread 2 now has a result that's potentially different than the one in the dictionary

So you can see how not using GetOrAdd makes things a lot more complex and could have potentially disastrous results. It's possible that in your case if you're deterministically calculating a string from the key it won't matter - but there's really no reason not to use this method. It also simplifies the code, and may (marginally) improve performance since it calculates the hash-code just once.

Another conclusion is choosing between the indexer and TryAdd is much more a question of correctness than of performance.

That said, the performance of both the indexer [] and TryAdd methods is identical, since both share the same implementation (TryAddInternal), which works by taking the lock belonging to the key's bucket, then checking if the key exists in the dictionary, and either updating the dictionary or not.

Lastly, here's an example that shows how to build methods like GetOrAdd correctly.

Eli Arbel
  • 22,391
  • 3
  • 45
  • 71
  • Upvoted for actually comparing the indexer code with `TryAdd`, which answers the exact OP's question. – vgru Aug 03 '16 at 12:27
  • Accepted over @SLaks answer, as it links to an answer which clarifies the correct use of GetOrAdd. Partially my bad: I missed the correct overload after reading SLaks's answer. Please note that using GetOrAdd does not eliminate the race condition, but this is not an issue. – Henk Kok Aug 03 '16 at 13:01
  • @EliArbel: https://msdn.microsoft.com/en-us/library/dd997369(v=vs.110).aspx describes a scenario where a thread may return a (potentially different) result set by another thread. It's below the sample code. – Henk Kok Aug 03 '16 at 14:54