I saw this question today about some performance difference regarding ConcurrentDictionary
methods, and I saw it as a premature micro-optimization.
However, upon some thought, I realized (if I am not mistaken), that each time we pass a lambda to a method, CLR needs to allocate the memory, pass the appropriate closure (if needed), and then collect it some time later.
There are three possibilities:
Lambda without a closure:
// the lambda should internally compile to a static method, // but will CLR instantiate a new ManagedDelegate wrapper or // something like that? return concurrent_dict.GetOrAdd(key, k => ValueFactory(k));
Lambda with a closure:
// this is definitely an allocation return concurrent_dict.GetOrAdd(key, k => ValueFactory(k, stuff));
Outside check (like checking the condition before the lock):
// no lambdas in the hot path if (!concurrent_dict.TryGetValue(key, out value)) return concurrent_dict.GetOrAdd(key, k => ValueFactory(k));
Third case is obviously allocation-free, the second one will need an allocation.
But is the first case (lambda which doesn't have a capture) completely allocation-free (at least in newer CLR versions)? Also, is this an implementation detail of the runtime, or something specified by the standard?