First, as mentioned in the comments, returning a reference (even a const one) from the lookup method is not correct. The referenced element may be modified by multiple threads or it can be removed from the cache while it is being used. You have to return by value or design the interface so that the operations on the cache elements are carried out while the mutex is locked. For example, by passing a function object to your operator()
.
As for possible ways of optimizations, the C++ standard library doesn't offer a solution beyond using a mutex. You can try using std::shared_mutex
so that the cache hits (which do not modify the data structure) could be processed in parallel. This will make cache misses more expensive since you would have to re-lock the mutex in exclusive mode and perform element lookup again.
Value operator()(Key const & key)
{
// Try the fast path first
{
std::shared_lock<std::shared_mutex> lock(_mutex);
auto value = _values.find(key);
if (value != _values.end())
return value->second;
}
// Handle the cache miss
std::lock_guard<std::shared_mutex> lock(_mutex);
// Search the element again, as it could have been added by another thread
auto value = _values.lower_bound(key);
if (value == _values.end() || value->first != key)
{
value = _values.emplace_hint(value, key, _func(key));
}
return value->second;
}
You can slightly improve this by using upgrade_mutex
from Boost.Thread. It would allow you to atomically upgrade the shared lock to exclusive one and avoid the second element lookup.
Further, you can consider using Threading Building Blocks library. In particular, concurrent_lru_cache
and concurrent_unordered_map
seem relevant.
Lastly, if all of the above is not enough, you can consider looking for a paper and implementing a lock-free structure yourself. And I mean it when I say this is the last resort option, because lock-free programming is a difficult task. You should really try hard to avoid it and rely on existing solutions, which are written by experts and are well tested. Also, some lock-free algorithms may be covered by patents.
As a side general note, you may want to rethink your design. It may be possible to remove the thread contention bottleneck by eliminating the cache. In a massively threaded environment it may be more efficient to re-acquire the cached values from the original source than having all the threads contend for a common cache. Or you could at least reduce the negative effect of contention by having multiple cache instances, one per a small group of threads (or one per thread).