1

I want to cache calculation results in a ConcurrentDictionary<TKey,TValue>. Several threads may query the cache for an entry and generate it if it does not exist. Since GetOrAdd(TKey, Func<TKey,TValue>) is not atomic, I think I should use GetOrAdd(TKey, TValue) with Task<CacheItem> as TValue. So, when a thread wants to query a cache item, it generates a cold task coldTask, that is a task, which is not started, and potentially generates the the item, calls var cacheTask = cache.GetOrAdd(key, coldTask) for some key object, and then checks whether cacheTask is started or even has a result. If cacheTask is not started, the calling thread starts the task.

Is this a valid approach in principle?

One problem that remains is that

if(cacheTask.Status == Status.Created)
  cacheTask.Start();

is not atomic, so the cacheTask may be started from another thread, before cacheTask.Start() is called here. Is

try {
if(cacheTask.Status == Status.Created)
  cacheTask.Start();
} catch {}

a valid workaround?

MarkusParker
  • 1,264
  • 2
  • 17
  • 35
  • 1
    Did you try to use 'lock'? – Mixxiphoid Mar 16 '22 at 08:26
  • Define "valid". I would create a wrapper object with a lock, so that you don't have this problem in the first place. – Lasse V. Karlsen Mar 16 '22 at 08:28
  • Do you mean something like: if(cacheTask.Status == Status.Created) { lock(cacheTask) { if(cacheTask.Status == Status.Created) cacheTask.Start(); }} – MarkusParker Mar 16 '22 at 08:31
  • "Valid" means good style, efficient, effective – MarkusParker Mar 16 '22 at 08:34
  • Related: [ConcurrentDictionary GetOrAdd async](https://stackoverflow.com/questions/54117652/concurrentdictionary-getoradd-async) – Theodor Zoulias Mar 16 '22 at 08:49
  • 1
    Not sure this is the best approach in general, but instead of checking the task status, check whether the returned task is the instance that *this thread* just tried to add. If it is, then it's *this thread's* job to complete the task. No race. – Damien_The_Unbeliever Mar 16 '22 at 09:05
  • And rather than using cold tasks, I'd suggest just using a task obtained from `TaskCompletionSource` - as again, one thread can determine that it needs to complete the task and can achieve that directly rather than handing off to a separate `Task`. – Damien_The_Unbeliever Mar 16 '22 at 09:07
  • @Damien_The_Unbeliever using cold tasks is rarely a good idea, but this specific case might be one of those rare exceptions to the rule! Using a `TaskCompletionSource` will just make to code more verbose IMHO. – Theodor Zoulias Mar 16 '22 at 09:37
  • @TheodorZoulias - well, I constructed an answer from my comments. Do you really consider it more verbose than a separate `Task`? – Damien_The_Unbeliever Mar 16 '22 at 09:39
  • MarkusParker aren't you concerned about the possibility of storing faulted tasks in the cache? Caching exceptions instead of actual results is not desirable in general. How are you handling this case? – Theodor Zoulias Mar 16 '22 at 10:05
  • @TheodorZoulias In my case I cache calculation results. I don't expect the calculation to fail. If it fails nevertheless, the whole thing may fail as well. – MarkusParker Mar 16 '22 at 10:40

2 Answers2

2

The principle should be fine, to start the task you should be able to do something like:

var newTask = new Task(...);
var dictionaryTask = myDictionary.GetOrAdd(myKey, newTask);
if(dictionaryTask  == newTask){
    newTask.Start();
}
return await dictionaryTask;

That should ensure that only the thread that created the task starts it.

I would suggest checking out Lazy<T> since it is somewhat related. I would also suggest doing some bench-marking, since the most appropriate approach will depend on your specific use case. Keep in mind that async/await, or blocking, a task will have some overhead, so it will depend on the cost of generating values, and the frequency this is done at.

JonasH
  • 28,608
  • 2
  • 10
  • 23
  • The [`Task.Start`](https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.task.start) method schedules the execution on the `TaskScheduler.Current`, which has [known problems](https://learn.microsoft.com/en-us/dotnet/fundamentals/code-analysis/quality-rules/ca2008 "CA2008: Do not create tasks without passing a TaskScheduler"). I would suggest configuring explicitly the `scheduler` (pass the `TaskScheduler.Default` as argument). – Theodor Zoulias Mar 16 '22 at 09:53
1

As I suggested in the comments, I'd use TaskCompletionSource<TResult> and reference equality to avoid races and unnecessary additional tasks to be scheduled:

var tcs = new TaskCompletionSource<CacheItem>();
var actualTask = theDictionary.GetOrAdd(key, tcs.Task);
if(ReferenceEquals(actualTask, tcs.Task))
{
    //Do the actual work here
    tcs.SetResult(new CacheItem());
}
return actualTask;

If generation can fail then the //Do the actual work here section should be wrapped in a try/catch and SetException should be used on the completion source (to indicate to any existing waiters that the failure has occurred). But then you have to consider what it means for that failed entry in the cache, whether to remove or retry, etc, and all of the complexity that arises from trying to build a cache in the first place.

Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448