How to make a cache thread safe

Question

I have a class the models a cache and now I want to refactor it to be safe when accessed by multiple threads. My first idea was to just add a std::mutex to the class and lock it in the functions like that:

#include <map>
#include <mutex>
#include <functional>

template <typename Key, typename Value>
class Cache
{
public:
  Cache(std::function<Value(Key const &)> func) : _func(func) { }

  Value & operator()(Key const & key)
  {
    std::lock_guard<std::mutex> lock(_mutex);
    auto value = _values.lower_bound(key);
    if (value == _values.end() || value->first != key)
    {
      value = _values.emplace_hint(value, key, _func(key));
    }
    return value->second;
  }
protected:
private:
  std::mutex _mutex;
  std::function<Value(Key const &)> _func;
  std::map<Key, Value> _values;  
};

I expect that cache hits are much more frequent than cache misses so I would like to have a more efficient locking strategy that takes that into account. How can I achieve that? Should I use a different container? I can use external libraries but I would prefer a solution that only depends on the standard library.

What you seem to want is a *lock free map*. There are a few papers on the subject, and perhaps even some implementations if you search a little. — Some programmer dude, Jan 29 '20 at 09:13
Your very approach is flawed: Your cache returns a mutable reference to data that is shared between threads. This can not work reliably! Shared data must be protected with a mutex or it must be `const` (like in `shared_ptr`). — Ulrich Eckhardt, Jan 29 '20 at 09:14
This document can help you http://erdani.com/publications/cuj-2004-12.pdf — Paul, Jan 29 '20 at 09:21
Sad truth is - mutexes are almost always faster than complicated lockfree data structure. Seen it, done it, know it — bartop, Jan 29 '20 at 09:38
@UlrichEckhardt Whether the approach is flawed depends on the intended usage. Using `shared_ptr` is a sign of poor design, though, in 99% of cases. — Maxim Egorushkin, Jan 29 '20 at 10:24
@bartop That is sad, but not true. [One example](https://max0x7ba.github.io/atomic_queue/html/benchmarks.html). — Maxim Egorushkin, Jan 29 '20 at 10:26
@MaximEgorushkin I did not write it has no use or is never better, but let's be honest. Quoted benchmark is pretty much primed to promote lock-free approach. In most of scenarios this is not the case — bartop, Jan 29 '20 at 10:52
Can you elaborate those flaws a bit, @MaximEgorushkin? In my experience, the pair `unique_ptr` (or `auto_ptr` in ye olden days) and `shared_ptr` where pretty useful for that. You used `unique_ptr` to represent exclusively owned, mutable state and `shared_ptr` to represent shared, immutable state. — Ulrich Eckhardt, Jan 29 '20 at 11:18
@UlrichEckhardt Most applications, if not all, have hierarchical structure. If one spends time to uncover the hierarchy and establish what owns what the need for shared ownership evaporates. `shared_ptr` is the worst pointer in terms of size and performance. — Maxim Egorushkin, Jan 29 '20 at 11:25
So basically, someone that uses shared data would be better off refactoring their application, so that it doesn't share data any more. Further, the same applies to services ("objects with behaviour"), as you write it. In case it's not obvious yet, I have a different opinion here. — Ulrich Eckhardt, Jan 29 '20 at 11:57
@UlrichEckhardt _refactoring their application, so that it doesn't share data any more_ - nope, data sharing should stay, unclear ownership is the problem. — Maxim Egorushkin, Jan 29 '20 at 12:05
Then you lost me. I never said unclear ownership was a solution or a goal. Also, you say "the need for shared ownership evaporates" and two comments below "data sharing should stay". That's a contradiction. Or, maybe, you just elided several assumption underneath those statements that could put those into a proper context. — Ulrich Eckhardt, Jan 29 '20 at 12:22
@UlrichEckhardt Objects can access data (and keep non-owning references and pointers) even if they do not own the data. They just need guarantees that the data stays alive. This guarantee can come from the owners of the accessing object (or the container, where it resides in). I believe this is, what Maxim meant with hierarchy. And I fully agree. — Sebastian, Nov 12 '22 at 14:00
Maybe our disagreement comes from a difference in definitions. My definition of ownership implies that without ownership, you can not use something. Now, there are (at least) two kinds of ownership: It can be shared (multiple equal owners) and exclusive (only a single owner). When you write objects "just need guarantees that the data stays alive", that is what ownership guarantees when implemented correctly. — Ulrich Eckhardt, Nov 13 '22 at 09:44
@UlrichEckhardt IMHO the owner is responsible for destroying the resource at the appropriate time and keeping it alive until then. See also https://stackoverflow.com/questions/49024982/what-is-ownership-of-resources-or-pointers/49025071#49025071 There is the concept of non-owning pointers, which may use, but not destroy a resource. This is different from shared ownership (with which usually the last owning object alive destroys the resource). Non-owners are most useful within class members. — Sebastian, Nov 13 '22 at 13:32
If the containing class guarantees that a resource stays alive, the members can just use the resource, without having to care. This can be a memory array or a file or any resource. You could provide the member objects the resource by handle, pointer or reference. The containing class keeps care of construction and destruction order of the resource and member in its constructor and destructor. — Sebastian, Nov 13 '22 at 13:33
Another usage is within local scopes. You aquire a resource and then use it locally directly, within functions and for locally constructed objects. Those functions and objects will (often) not be owners. E.g. for a call to simple functions like memcpy or std::copy you do not transfer ownership to the called function. The same can be true for more complex functions or objects. — Sebastian, Nov 13 '22 at 13:36
The advantage is that destruction time is more determined, ownership handling is more performant and clearing up the ownership question often leads to better structured code and less bugs. — Sebastian, Nov 13 '22 at 13:38
When I pass a reference to a function, the code inside and outside shares ownership in my definition, and that's precisely where we disagree on the use of that term. The case of passing an object as reference is simple, because the duration of the sharing is automatically nested in the lifetime of the object. BTW: Yes, you can of course model even my definition of shared ownership using raw pointers. This might be more efficient even, but requires greater care for implementing. No contradiction there. — Ulrich Eckhardt, Nov 14 '22 at 07:08
@UlrichEckhardt Please be aware that there is a difference between passive use and active destruction at the end of use. (*(Only) possible* active destruction can happen with `shared_ptr`.) It is especially important for APIs to clearly specify, who is responsible for destruction. The modern C++ convention is to use `unique_ptr` to transfer this responsibility (and `shared_ptr` for shared responsibility) and to use plain pointers and references to signify that the callee will clean up resources. Resources other than memory are transferred using RAII and the same smart pointers. — Sebastian, Nov 14 '22 at 08:37
`unique_ptr` is suitable, because it cannot be copied, only moved, and automatically destructs an object, when it falls out of scope. So it is very safe, natural and deterministic to pass around the destruction responsibility with it. — Sebastian, Nov 14 '22 at 08:41
BTW *Rust* has adopted this C++ methodology of moving around references and ownership to manage lifetimes with its famous borrowing mechanism. — Sebastian, Nov 14 '22 at 08:51

score 1 · Accepted Answer · answered Jan 29 '20 at 10:38

First, as mentioned in the comments, returning a reference (even a const one) from the lookup method is not correct. The referenced element may be modified by multiple threads or it can be removed from the cache while it is being used. You have to return by value or design the interface so that the operations on the cache elements are carried out while the mutex is locked. For example, by passing a function object to your operator().

As for possible ways of optimizations, the C++ standard library doesn't offer a solution beyond using a mutex. You can try using std::shared_mutex so that the cache hits (which do not modify the data structure) could be processed in parallel. This will make cache misses more expensive since you would have to re-lock the mutex in exclusive mode and perform element lookup again.

Value operator()(Key const & key)
{
    // Try the fast path first
    {
        std::shared_lock<std::shared_mutex> lock(_mutex);
        auto value = _values.find(key);
        if (value != _values.end())
            return value->second;
    }

    // Handle the cache miss
    std::lock_guard<std::shared_mutex> lock(_mutex);

    // Search the element again, as it could have been added by another thread
    auto value = _values.lower_bound(key);
    if (value == _values.end() || value->first != key)
    {
        value = _values.emplace_hint(value, key, _func(key));
    }
    return value->second;
}

You can slightly improve this by using upgrade_mutex from Boost.Thread. It would allow you to atomically upgrade the shared lock to exclusive one and avoid the second element lookup.

Further, you can consider using Threading Building Blocks library. In particular, concurrent_lru_cache and concurrent_unordered_map seem relevant.

Lastly, if all of the above is not enough, you can consider looking for a paper and implementing a lock-free structure yourself. And I mean it when I say this is the last resort option, because lock-free programming is a difficult task. You should really try hard to avoid it and rely on existing solutions, which are written by experts and are well tested. Also, some lock-free algorithms may be covered by patents.

As a side general note, you may want to rethink your design. It may be possible to remove the thread contention bottleneck by eliminating the cache. In a massively threaded environment it may be more efficient to re-acquire the cached values from the original source than having all the threads contend for a common cache. Or you could at least reduce the negative effect of contention by having multiple cache instances, one per a small group of threads (or one per thread).

On a cache miss the code ends up doing 2 searches. 1 search should be enough. — Maxim Egorushkin, Jan 29 '20 at 10:41
`_values.find` - search 1, `_values.lower_bound` - search 2. — Maxim Egorushkin, Jan 29 '20 at 10:43
Nope, only one search is necessary even if you cannot upgrade the lock to exclusive. — Maxim Egorushkin, Jan 29 '20 at 11:43
In order to be unnecessary, you would have to pass some state (e.g. an iterator) referencing an element in the container across the locked section boundary. That state can be invalidated by another thread modifying the container while the mutex is not locked (e.g. another thread can remove the element pointed to by the iterator). Therefore you have to perform the second lookup when you reacquire the mutex. — Andrey Semashev, Jan 29 '20 at 12:04
The hint iterator for `emplace_hint` doesn't need to be correct. Hence, you can do only one `lower_bound` search and use that iterator as hint when the search is unsuccessful. — Maxim Egorushkin, Jan 29 '20 at 12:07
The hint iterator must be a *valid* iterator for the container, even if the instertion doesn't end up happening at its position. Using an invalid iterator is UB. — Andrey Semashev, Jan 29 '20 at 12:08
I am not suggesting passing an invalid iterator. I am saying that you can pass any valid iterator. If the hint isn't right the container does an extra lookup for you. This is why that iterator is called _hint_. — Maxim Egorushkin, Jan 29 '20 at 12:15
You have to assume the hint is invalid as soon as you release the lock. Which means you cannot pass a valid hint from one locked section to another and have to perform a second lookup. Which is what the code in the answer does. — Andrey Semashev, Jan 29 '20 at 12:20
The iterator stays valid because there is no `erase` involved. — Maxim Egorushkin, Jan 29 '20 at 12:22
You don't know that. I assume the removals are a possibility. — Andrey Semashev, Jan 29 '20 at 12:23

score 0 · Answer 2 · answered Jan 29 '20 at 10:33

One simple approach would be to use std::shared_mutex or boost::shared_mutex if you don't have C++17. Readers would take a shared lock, writers - exclusive.

You may also like to use std::unordered_map instead of std::map. The latter is cache inefficient with log2(N) complexity of search and insert. Whereas std::unordered_map can be O(1) if you choose a suitable hash function and/or hash seed.

How to make a cache thread safe

2 Answers2