42

I am changing a single thread program into multi thread using boost:thread library. The program uses unordered_map as a hasp_map for lookups. My question is..

At one time many threads will be writing, and at another many will be reading but not both reading and writing at the same time i.e. either all the threads will be reading or all will be writing. Will that be thread safe and the container designed for this? And if it will be, will it really be concurrent and improve performance? Do I need to use some locking mechanism?

I read somewhere that the C++ Standard says the behavior will be undefined, but is that all?

UPDATE: I was also thinking about Intel concurrent_hash_map. Will that be a good option?

questions
  • 2,337
  • 4
  • 24
  • 39
  • 7
    "but is that all"... being UB should be enough to not do it. – PlasmaHH Mar 13 '12 at 14:02
  • Read this http://stackoverflow.com/questions/1362110/is-the-c-stl-stdset-thread-safe – xanatos Mar 13 '12 at 14:03
  • @PlasmaHH If it is implementation-specific it could be, if you want to write your code for that implementor. If you write code for Windows, the way Microsoft implements something could be enough for you. – xanatos Mar 13 '12 at 14:04
  • 2
    Read "behavior will be undefined" as will not work. – brian beuning Jul 17 '14 at 01:43
  • Re: `concurrent_hash_map`, I have used it and it works well. See [full sample code](https://stackoverflow.com/questions/60586122/tbbconcurrent-hash-mapk-v-sample-code-for-intel-threading-building-blocks-t). – Contango Mar 08 '20 at 10:06

7 Answers7

69

STL containers are designed so that you are guaranteed to be able to have:

A. Multiple threads reading at the same time

or

B. One thread writing at the same time

Having multiple threads writing is not one of the above conditions and is not allowed. Multiple threads writing will thus create a data race, which is undefined behavior.

You could use a mutex to fix this. A shared_mutex (combined with shared_locks) would be especially useful as that type of mutex allows multiple concurrent readers.

http://eel.is/c++draft/res.on.data.races#3 is the part of the standard which guarantees the ability to concurrently use const functions on different threads. http://eel.is/c++draft/container.requirements.dataraces specifies some additional non-const operations which are safe on different threads.

  • @EthanSteinberg- std::mutex? or boost:mutex? – questions Mar 13 '12 at 14:12
  • @questions They are all the same. Boost::mutex might be a bit easier to use as many more compilers support it than std::mutex. –  Mar 13 '12 at 14:14
  • @EthanSteinberg- I dont need to have any locks on readers, right? Because multiple threads can read at the same time. – questions Mar 14 '12 at 00:47
  • 3
    Do you have a source for the claim: Multiple threads are guaranteed to be able be able to read at the same time? – DrYap Sep 10 '15 at 22:47
  • 4
    to clarify I can not read from one thread and write from another at the same time? – Oleg Vazhnev Sep 24 '15 at 22:00
  • I thought the opposite was true—STL containers are not thread-safe. Can you provide a citation? – Mihai Danila Oct 11 '17 at 13:36
  • @MihaiDanila Well, they aren't thread safe in the sense that you can't have multiple writers and/or readers at the same time. You can however have multiple readers at the same time. See http://eel.is/c++draft/res.on.data.races#3 for how the standard library isn't allowed to cause races when you use const (read only functions). http://eel.is/c++draft/container.requirements.dataraces even adds some additional guarantees which allow you to even perform some "non-const" operations safely. –  Oct 11 '17 at 16:45
21

std::unordered_map meets the requirements of Container (ref http://en.cppreference.com/w/cpp/container/unordered_map). For container thread safety see: http://en.cppreference.com/w/cpp/container#Thread_safety.

Important points:

  • "Different elements in the same container can be modified concurrently by different threads"
  • "All const member functions can be called concurrently by different threads on the same container. In addition, the member functions begin(), end(), rbegin(), rend(), front(), back(), data(), find(), lower_bound(), upper_bound(), equal_range(), at(), and, except in associative containers, operator[], behave as const for the purposes of thread safety (that is, they can also be called concurrently by different threads on the same container)."
Ida
  • 3,417
  • 1
  • 11
  • 11
11

Will that be thread safe and the container designed for this?

No, the standard containers are not thread safe.

Do I need to use some locking mechanism?

Yes, you do. Since you're using boost, boost::mutex would be a good idea; in C++11, there's std::mutex.

I read somewhere that the C++ Standard says the behavior will be undefined, but is that all?

Indeed, the behaviour is undefined. I'm not sure what you mean by "is that all?", since undefined behaviour is the worst possible kind of behaviour, and a program that exhibits it is by definition incorrect. In particular, incorrect thread synchronisation is likely to lead to random crashes and data corruption, often in ways that are very difficult to diagnose, so you would be wise to avoid it at all costs.

UPDATE: I was also thinking about Intel concurrent_hash_map. Will that be a good option?

It sounds good, but I've never used it myself so I can't offer an opinion.

Mike Seymour
  • 249,747
  • 28
  • 448
  • 644
  • Re: `concurrent_hash_map`, I have used it and it works well. See [full sample code](https://stackoverflow.com/questions/60586122/tbbconcurrent-hash-mapk-v-sample-code-for-intel-threading-building-blocks-t). – Contango Mar 08 '20 at 10:05
3

The existing answers cover the main points:

  • you must have a lock to read or write to the map
  • you could use a multiple-reader / single-writer lock to improve concurrency

Also, you should be aware that:

  • using an earlier-retrieved iterator, or a reference or pointer to an item in the map, counts as a read or write operation

  • write operations performed in other threads may invalidate pointers/references/iterators into the map, much as they would if they were done in the same thread, even if a lock is again acquired before an attempt is made to continue using them...

Tony Delroy
  • 102,968
  • 15
  • 177
  • 252
1

std::unordered_map is a good fit for some multi-threaded situations.

There are also other concurrent maps from Intel TBB:

  • tbb:concurrent_hash_map. It supports fine-grained, per-key locking for insert/update, which is something that few other hashmaps can offer. However, the syntax is slightly more wordy. See full sample code. Recommended.
  • tbb:concurrent_unordered_map. It is essentially the same thing, a key/value map. However, it is much lower level, and more difficult to use. One has to supply a hasher, a equality operator, and an allocator. There is no sample code anywhere, even in the official Intel docs. Not recommended.
Contango
  • 76,540
  • 58
  • 260
  • 305
1

You can use concurrent_hash_map or employ an mutex when you access unordered_map. one of issue on using intel concurrent_hash_map is you have to include TBB, but you already use boost.thread. These two components have overlapped functionality, and hence complicate your code base.

Chang
  • 3,953
  • 2
  • 30
  • 43
0

If you don't need all of the functionality of unordered_map, then this solution should work for you. It uses mutex to control access to the internal unordered_map. The solution supports the following methods, and adding more should be fairly easy:

  • getOrDefault(key, defaultValue) - returns the value associated with key, or defaultValue if no association exists. A map entry is not created when no association exists.
  • contains(key) - returns a boolean; true if the association exists.
  • put(key, value) - create or replace association.
  • remove(key) - remove association for key
  • associations() - returns a vector containing all of the currently known associations.

Sample usage:

/* SynchronizedMap
** Functional Test
** g++ -O2 -Wall -std=c++11 test.cpp -o test
*/
#include <iostream>
#include "SynchronizedMap.h"

using namespace std;
using namespace Synchronized;

int main(int argc, char **argv) {

    SynchronizedMap<int, string> activeAssociations;

    activeAssociations.put({101, "red"});
    activeAssociations.put({102, "blue"});
    activeAssociations.put({102, "green"});
    activeAssociations.put({104, "purple"});
    activeAssociations.put({105, "yellow"});
    activeAssociations.remove(104);

    cout << ".getOrDefault(102)=" << activeAssociations.getOrDefault(102, "unknown") << "\n";
    cout << ".getOrDefault(112)=" << activeAssociations.getOrDefault(112, "unknown") << "\n";

    if (!activeAssociations.contains(104)) {
        cout << 123 << " does not exist\n";
    }
    if (activeAssociations.contains(101)) {
        cout << 101 << " exists\n";
    }

    cout << "--- associations: --\n";
    for (auto e: activeAssociations.associations()) {
        cout << e.first << "=" << e.second << "\n";
    }
}

Sample output:

.getOrDefault(102)=green
.getOrDefault(112)=unknown
123 does not exist
101 exists
--- associations: --
105=yellow
102=green
101=red

Note1: The associations() method is not intended for very large datasets as it will lock the table during the creation of the vector list.

Note2: I've purposefully not extended unordered_map in order to prevent my self from accidentally using a method from unordered_map that has not been extended and therefore might not be thread safe.

#pragma once
/*
 * SynchronizedMap.h
 * Wade Ryan 20200926
 * c++11
 */

#include <unordered_map>
#include <mutex>
#include <vector>

#ifndef __SynchronizedMap__
#define __SynchronizedMap__

using namespace std;


namespace Synchronized {

    template <typename KeyType, typename ValueType>

    class SynchronizedMap {

    private:
        mutex sync;
        unordered_map<KeyType, ValueType> usermap;    

    public:
        ValueType getOrDefault(KeyType key, ValueType defaultValue) {
            sync.lock();

            ValueType rs;

            auto value=usermap.find(key);
            if (value == usermap.end()) {
                rs = defaultValue;
            } else {
                rs = value->second;
            }
            sync.unlock();
            return rs;
        }

        bool contains(KeyType key) {
            sync.lock();
            bool exists = (usermap.find(key) != usermap.end());
            sync.unlock();
            return exists;
        }

        void put(pair<KeyType, ValueType> nodePair) {
            sync.lock();

            if (usermap.find(nodePair.first) != usermap.end()) {
                usermap.erase(nodePair.first);
            }
            usermap.insert(nodePair);
            sync.unlock();
        }

        void remove(KeyType key) {
            sync.lock();

            if (usermap.find(key) != usermap.end()) {
                usermap.erase(key);
            }
            sync.unlock();
        }

        vector<pair<KeyType, ValueType>> associations() {
            sync.lock();
 
            vector<pair<KeyType, ValueType>> elements;

            for (auto it=usermap.begin(); it != usermap.end(); ++it) {
                pair<KeyType, ValueType> element (it->first, it->second);
                elements.push_back( element );
            }

            sync.unlock();
            return elements;
        }
    };       
}

#endif
wryan
  • 1,371
  • 1
  • 9
  • 4