1

I'm working in a multithreading environment. Basically, I have an unordered_map which could be accessed by multiple threads at the same time. Right now, my algorithm is:

function foo(key) {
  scoped_lock()
  if key exists {
    return Map[key]
  }

  value = get_value()
  Map[key] = value
}

Obviously the performance are not good with this implementation. Is there any algorithm/approach that I can use in order to improve performance?

EDIT:

I did a number of tests and I thought about the double-checked locking. So, I modified the code with:

function foo(key) {
  if key exists {
    return Map[key]
  }

  scoped_lock()
  if key exists {
    return Map[key]
  }

  value = get_value()
  Map[key] = value
}

Actually I only add another check before the scoped_lock(). In this scenario, suppose function is called N times. If the first m calls to foo, where m < N, filled the map and the next N - m calls only get values from the map, I don't need the exclusive access. Moreover, there is another check after the scoped_lock which ensures thread safety. Am I right? In any case, with the first code the execution needs ~208s, while the second one ~200s.

Surcle
  • 572
  • 1
  • 5
  • 15

1 Answers1

5

Here is a utility class:

template<class T, class M=std::mutex, template<class...>class S=std::unique_lock, template<class...>class U=std::unique_lock>
struct mutex_protected {
  template<class F>
  auto read( F&& f ) const
  -> typename std::result_of<F&&(T const&)>::type
  {
    auto l = lock();
    return std::forward<F>(f)(data);
  }
  template<class F>
  auto write( F&& f ) 
  -> typename std::result_of<F&&(T&)>::type
  {
    auto l = lock();
    return std::forward<F>(f)(data);
  }
  mutex_protected(mutex_protected&&)=delete;
  mutex_protected& operator=(mutex_protected&&)=delete;

  template<class...Args>
  mutex_protected( Args&&...args ):
    data( std::forward<Args>(args)... )
  {}
private:
  mutable M m;
  T data;

  U<M> lock() { return U<M>(m); }
  S<M> lock() const { return S<M>(m); }
};

it, especially in , lets you interact with a mutex protected instance of data in an easy to write way.

In you can use std::shared_timed_mutex and in you can use std::shared_mutex like this:

template<class T>
using rw_guarded = mutex_guarded< T, std::shared_mutex, std::shared_lock >;

this enables the ability to have many readers at once. But you should first determine if the simple mutex one is fast enough.

struct cache {
  using Key=std::string;
  using Value=int;
  using Map=std::unordered_map< Key, Value >;
  Value get( Key const& k ) {
    Value* r = table.read([&](Map const& m)->Value*{
      auto it = m.find(k);
      if (it == m.end()) return nullptr;
      return std::addressof( it->second );
    });
    if (r) return *r;
    return table.write([&](Map& m)->Value{
      auto it = m.find(k);
      if (it != m.end()) return it->second;
      auto r = m.insert( std::make_pair(k, 42) ); // construct data here
      return r.first->second;
    });
  }
private:
  mutex_guarded< std::unordered_map< Key, Value > > table;
};

Upgrade mutex_guarded to rw_guarded and it switches to reader-writer locks.


Here is a more complex version:

Have two maps; one to value, one to shared future of value.

Use a reader writer lock (aka shared mutex).

To get, get a shared lock. Check if it is there. If it is, return.

Unlock first map. Lock second map for writing. If there isn't already a shared future under the key, add one. Unlock map 2, and wait on shared future regardless of if you added it.

When done, lock first map for reading; check if result already there. If yes, return it. If not, unlock, relock for writing, move data into map 1 if not already there, return data in first map.

This is designed to minimize the period map 1 is locked for exclusively, allowing max concurrency there.

Other designs will optimize other considerations.

Do not use operator[]. Do not interact with any map without a lock of some kind active. Know which locks correspond to which map. Note that reading elements (not looking up) can be done without a lock in some cases. Sometimes reading copies of shared things is required, not the shared thing. Look up the docs of every type to determine what operations need what locks.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • Thank you for your reply. The idea sounds good even if I don't understand it completely. I was wondering if there is an alorithm that I can follow. – Surcle Mar 14 '18 at 08:35
  • I used a second approach. What do you think about it? – Surcle Mar 14 '18 at 09:01
  • 1
    @surcle it causes undefined behaviour due to lack of synchronization. So not good. – Yakk - Adam Nevraumont Mar 14 '18 at 09:58
  • ok. So even if there is a lock after, the double checking indeed, is it not enough? I thought that I could use that approach since the element is not modified. – Surcle Mar 14 '18 at 10:03
  • however I read that the shared lock could be expensive. So with that approach the performance could be worst. It's strange that there is not a kind of "standard approach" for this problem. What you explained in the post is your algorithm or you read something somewhere? I'd like to understand it deeply. – Surcle Mar 14 '18 at 10:08
  • Did you suggest a shared future so that other interested readers may wait on that future while the prospective writer generates the value? If not, what's the motivation for using it? – einpoklum Mar 14 '18 at 10:08
  • @einpo Suppose 3 threads at once all want to create it. One actually does. All 3 wait on it finishing with no locks on anything but it. When it finishes, the three race to lock and copy it over to the value-map. The first one does thr copy, the other 2 spot the copy and just return. – Yakk - Adam Nevraumont Mar 14 '18 at 10:15
  • 1
    @surcle no, it is something I threw together. SRW lock is the standard approach; mine is modified to assume creating objects is medium-expensive and you cannot afford to block other threads while doing so. If creating is cheap, simpler solution exists. – Yakk - Adam Nevraumont Mar 14 '18 at 10:17
  • @Yakk: But suppose the writing is near-instantaneous, e.g. the value is 8 bytes, or fits in a cache line. The two latter prospective writers have to wait on the global write lock for the second map anyway, before they can wait on the future. And if we had no future, then they would immediately notice there's already a value, and fail. Or... ah, you're thinking about the case of overwrites rather than just filling the map. I see. I'd still mention that's what the futures are for, and if the map is fill-only they're unnecessary. – einpoklum Mar 14 '18 at 10:41
  • @Yakk since I'm using a boost::shared_ptr as value to return, I think that the second solution that I posted is correct, right? – Surcle Mar 14 '18 at 16:22
  • 1
    @Surcle No it is not. – Yakk - Adam Nevraumont Mar 14 '18 at 16:49
  • @Yakk it's me again. I use a profiler and actually the method which creates the object does not take so much time. I'm trying to understand how I can optimize the function based on this consideration, but I don't understand how to do it. – Surcle Mar 15 '18 at 08:27