3

I have an unordered_map<int, float> localCostMap describing the costs (float) between node IDs (int). The calculation of the second value is quite complex, but due to the structure of graph (directed, acyclic, up to two parent nodes) I can save many calculations by pooling the maps for each node into another map like so:

unordered_map<int, shared_ptr<unordered_map<int, float>>> pooledMaps.

Once the values are written (into localCostMap), they do not get updated again, but the calculations required the map entries of the connected nodes, which may leed to lock-ups.

How can I make it so that I can read the values stored in the inner map while also safely adding new entries (e.g. { 3, 1.23 }? I'm new to multithreading and have tried to search for solutions, but the only results I got were older, despite reading that multithreading has improved much, particularly in C++20.

Thank you in advance!

Edit: As requested, here is a minimal working example. Of course, the full algorithm is more complex, considers the edge cases and enters them also for other applicable nodes (e.g. the result of comparing 5 & 7 also applies for 6 & 7).

// Example.h

#pragma once

#include <iostream>
#include <unordered_map>
#include <thread>

struct MyNode {
  int id;
  int leftNodeID;
  int rightNodeID;
  std::shared_ptr<std::unordered_map<int, float>> localCostMap; /* inherited from parent (left/right) nodes */
  std::unordered_map<int, std::shared_ptr<std::unordered_map<int, float>>> pooledMaps;

  MyNode() : id(0), leftNodeID(0), rightNodeID(0) { setLocalCostMap(); }
  MyNode(int _id, int leftID, int rightID) : 
    id(_id), leftNodeID(leftID), rightNodeID(rightID) { setLocalCostMap(); }

  void setLocalCostMap();

  float calculateNodeCost(int otherNodeID);
};
// Example.cpp

#include "NodeMapMin.h"

MyNode nodes[8];

void MyNode::setLocalCostMap() {
    if (leftNodeID == 0) { // rightNodeID also 0
      localCostMap = std::make_shared<std::unordered_map<int, float>>();
    }
    else {  // get map from connectednode if possible
      auto poolmap = nodes[leftNodeID].pooledMaps.find(rightNodeID);
      if (poolmap == nodes[leftNodeID].pooledMaps.end()) {
        localCostMap = std::make_shared<std::unordered_map<int, float>>();
        nodes[leftNodeID].pooledMaps.insert({ rightNodeID, localCostMap }); // [1] possible conflict
        nodes[rightNodeID].pooledMaps.insert({ leftNodeID, localCostMap }); // [1] possible conflict
      }
      else { localCostMap = poolmap->second; }
    }
  }

float MyNode::calculateNodeCost(int otherNodeID) {  
  if (id > 0) {
    std::cout << "calculateNodeCost for " << nodes[id].id << " and " << nodes[otherNodeID].id <<  std::endl;
  }
  float costs = -1.0f;
  auto mapIterator = localCostMap->find(otherNodeID);
  if (mapIterator == localCostMap->end()) {
    if (id == otherNodeID) { // same node
      std::cout << "return costs for " << id << " and " << otherNodeID << " (same node): " << 0.0f << std::endl;
      return 0.0f;
    }
    else if (leftNodeID == 0 || nodes[otherNodeID].leftNodeID == 0) {
      costs = ((float)(id + nodes[otherNodeID].id)) / 2;
      std::cout << "calculated costs for " << id << " and " << otherNodeID << " (no connections): " << costs << std::endl;
    }
    else if (leftNodeID == nodes[otherNodeID].leftNodeID && 
      rightNodeID == nodes[otherNodeID].rightNodeID) { // same connected nodes
      costs = nodes[leftNodeID].calculateNodeCost(rightNodeID); // [2] possible conflict
      std::cout << "return costs for " << id << " and " << otherNodeID << " (same connections): " << costs << std::endl;
      return costs;
    }
    else {
      costs = nodes[leftNodeID].calculateNodeCost(otherNodeID) +
        nodes[rightNodeID].calculateNodeCost(otherNodeID) +
        nodes[id].calculateNodeCost(nodes[otherNodeID].leftNodeID) +
        nodes[id].calculateNodeCost(nodes[otherNodeID].rightNodeID); // [2] possible conflict

      std::cout << "calculated costs for " << id << " and " << otherNodeID << ": " << costs << std::endl;
    }


      // [3] possible conflict
    localCostMap->insert({ otherNodeID, costs });
    nodes[otherNodeID].localCostMap->insert({ id, costs });
  }
  else {
    costs = mapIterator->second;
    std::cout << "found costs for " << id << " and " << otherNodeID << ": " << costs << std::endl;
  }

  return costs;
}

float getNodeCost(int node1, int node2) {
  return nodes[node1].calculateNodeCost(node2);
}

int main()
{
  nodes[0] = MyNode(0, 0, 0); // should not be used
  nodes[1] = MyNode(1, 0, 0); 
  nodes[2] = MyNode(2, 0, 0);
  nodes[3] = MyNode(3, 0, 0);
  nodes[4] = MyNode(4, 0, 0);
  nodes[5] = MyNode(5, 1, 2);
  nodes[6] = MyNode(6, 1, 2);
  nodes[7] = MyNode(7, 3, 4);

  //getNodeCost(5, 7);
  //getNodeCost(6, 7);
  std::thread w1(getNodeCost, 5, 7);
  std::thread w2(getNodeCost, 6, 7);
  w1.join();
  w2.join();
  std::cout << "done";
}

I commented out the single-thread variant, but you can easily see the difference as the multi-threaded version already has more (unneccessary) comparisons.

As you can see, whenever two "noteworthy" comparisons take place, the result is added to localCostMap, which is normally derived from the connected two nodes. Thus, one insert is necessary for all nodes with these two connections (left & right).

I see at least 3 problematic point:

  • When initializing the node and inserting the pooled maps for the connected nodes: if two nodes with the same connections were to be added at the same time, they would both want to create and add the maps for the connected nodes. [1]
  • When calculating the values, another thread might already be doing it, thus leading to unneccessary calculations. [2]
  • When inserting the results into localCostMap (and by that also to the maps of the connected nodes). [3]
Markstar
  • 639
  • 9
  • 24
  • https://stackoverflow.com/q/16781886/3684343 It looks like you can add/remove elements to the outer map while using a reference to an inner map. – mch May 30 '22 at 11:56
  • 1
    Can you post a brief [mre] please? (Code speaks louder than words). Then, people should be able to tell you if it's safe (or, perhaps, what is needed to make it safe). – Paul Sanders May 30 '22 at 12:16
  • Have you considered using some concurrent hash table? In my experience, `concurrent_unordered_map` provided by oneTBB (Intel TBB) is quite mature. It is efficient since it does not support erasure, which seems that you don't need (otherwise, there is also `concurrent_hash_map`). – Daniel Langr May 30 '22 at 12:27
  • @PaulSanders: I added the example, I hope it helps illustrating my point. It is clearly not safe right now, and also not optimal. – Markstar May 31 '22 at 11:06
  • @mark OK, thanks. Seems that Sam has addressed this in his rider to his answer below. – Paul Sanders May 31 '22 at 13:37
  • @PaulSanders: I'm still not entirely sure how exactly to do this, but I guess `std::shared_mutex` is a place to start. As I said, I'm new to multithreading and I'm not ashamed to admit that it is a bit overwhelming. – Markstar May 31 '22 at 14:03

1 Answers1

0

If you already have a std::shared_ptr to one of the inner maps it can be safely used, since, as you explained, once created it is never updated by any execution thread.

However since the outer map is being modified, all access to the outer map must be thread safe. None of the containers in the C++ library are thread safe, it is your responsibility to make them thread safe when needed. This includes threads that only access the outer map, since other execution threads might be modifying it. When something is modified all execution threads are required to use thread-safe access.

This means holding a mutex lock. The best way to avoid bugs that involve thread safety is to make it logically impossible to access something without holding a lock. And the most direct way of enforcing this would be the map to be wrapped as a private class member, with the only way to access it is to call a public method that grabs a mutex lock:

#include <unordered_map>
#include <memory>
#include <mutex>
#include <iostream>

class costs {

    std::unordered_map<int, std::shared_ptr<std::unordered_map<int, float>
                        >> m;

    std::mutex mutex;

public:
    template<typename T>
    auto operator()(T && t)
    {
        std::unique_lock lock{mutex};

        return t(m);
    }
};

int main() {

    costs c;

    // Insert into the map

    c([&](auto &m) {

        auto values=std::make_shared<std::unordered_map<int, float>>();

        (*values)[1]=2;

        m.emplace(1, values);
    });

    // Look things up in a map

    auto values=c([]
             (auto &m) -> std::shared_ptr<std::unordered_map<int, float>>
             {
                 auto iter=m.find(1);

                 if (iter == m.end())
                     return nullptr;

                 return iter->second;
             });

    // values can now be used, since nothing will modify it.

    return 0;
}

This uses some convenient features of modern C++, but can be implemented all the way back to C++11, with some additional typing.

The only way to access the map is to call the class's () operator which acquires a lock and calls the passed-in callable object, like a lambda, passing to it a reference to the outer map. The lambda can do whatever it wants with it, before returning, at which point the lock gets released.

It is not entirely impossible to defeat this kind of enforced thread safety, but you'll have to go out of your way to access this outer unordered map without holding a lock.

For completeness' sake you may need to implement a second () overload as a const class method.

Note that the second example looks up one of the inner maps and returns it, at which point it's accessible without any locks being held. Presumably nothing would modify it.

You should consider using maps of std::shared_ptr<const std::unordered_map<int, float> instead of std::shared_ptr<std::unordered_map<int, float>. This will let your C++ compiler enforce the fact that, presumably, once created these maps will never be modified. Like I already mentioned: the best way to avoid bugs is to make it logically impossible for them to happen.

Sam Varshavchik
  • 114,536
  • 5
  • 94
  • 148
  • It seems I wasn't quite clear, so I added an example to my OP. The nested map does get written to all the time and is the one I'm more worried about. If I understand your code correctly, the mutex locks the whole outer map, which means nobody can access it in the meantime (not even read from it), correct? I was hoping to find a way to only lock the inner map while also, if possible, to read the already written values in it (which do not change). – Markstar May 31 '22 at 11:12
  • Then take the same approach, but simply extend it to the inner map. Just put the wrapper on the inner map, instead. And for this kind of a behavior, simply use `std::shared_mutex` instead, allowing multiple readers, but blocking only when modifying the map. See your C++ textbook for more information on how to use various kinds of mutexes. And you still need to implement thread safety, if needed, on the outer map too. – Sam Varshavchik May 31 '22 at 11:15