3

I have a big plain struct without any methods. It includes many fields and another container(std::vector). I need to make it atomic, so as to let one producer thread and many consumer threads can access it concurrently. consumers don't modify the data, they just read the data. there is only one producer that adds/modifys(don't remove) the data, so it must keep the integrity/consistency of the data that can be seen by consumers whenever.

I think that there are two solutions can resolve it, but I don't know which one is better.

Method 1:

struct Single_t {
   size_t idx;
   double val;
}
using Details_t = std::vector<Single_t>;

struct BigStruct_1 {
   std::string      id;
   size_t           count;
   Details_t        details;
   std::atomic_bool data_ok { false };
};
std::map<std::string, BigStruct_1> all_structs1;

Method 2:

// Single_t and Details_t are same as the ones in Method1.
struct BigStruct_2 {
   std::string id;
   size_t      count;
   Details_t   details;
};
std::map<std::string, std::atomic<BigStruct_2> > all_structs2;

I prefer Methods2 up to now because I don't need to check whether data_ok in everywhere of consumer's codes. Infact consumer holds a pointer to a single BigStructs_2, I want the data can be accessed anytime(the data will not be removed). Fortunately there is only one producer in my system, so the producer can simply overwrite the data in one step: std::atomic<BigStruct_2>::store( tmp_pod );, So I don't need to check whether or not the data is ready/OK(It should be always OK) when a consumer is reading the data, and I also don't need to use std::atomic::compare_exchange_xxx in this case, so it isn't necessary to offer a comparing algorithm between two BigStruct_2 objs. Am I right?

If I use Method1, when I need to modify the data I have to code as following:

auto& one_struct = all_structs1["some_id"];
one_struct.data_ok.store( false, memory_order_release );
// change the data....
one_struct.data_ok.store( true, memory_order_release );

and check if data_ok everywhere in consumer's codes... it's ugly and dangerous(easy to forget to check).

Any recomments/hints will be appreciated!

Leon
  • 1,489
  • 1
  • 12
  • 31
  • As an initial matter, neither `BigPOD_0` nor `BigPOD_1` are POD structs, as they contain a member of non-POD types `std::string` and `Details_t` – Igor Tandetnik Nov 16 '20 at 04:12
  • Setting `BigPOD_X` aside for the moment, it's not clear how you plan to prevent data races on `all_pods0` / `all_pods1`. Those are shared data structures that don't appear to be protected from concurrent access in any way. It would help if you prepare a [mcve]. – Igor Tandetnik Nov 16 '20 at 04:19
  • Thanks a lot! It's my mistake, I will recorrect it. I write POD here as what I actually emphasize is that this struct has no methods, i.e. it is just only a plain struct to be used to deliver data. – Leon Nov 16 '20 at 04:22

4 Answers4

2

Another solution for updating shared objects that are predominately read is seqlock:

Seqlocks are an important synchronization mechanism and represent a significant improvement over conventional reader-writer locks in some contexts. They avoid the need to update a synchronization variable during a reader critical section, and hence improve performance by avoiding cache coherence misses on the lock object itself.

Maxim Egorushkin
  • 131,725
  • 17
  • 180
  • 271
  • Thanks for your recomment! The SeqLock is a new element to me, I have picked it into my tool box! But in this case, I think it is not very proper. If a SeqLock is used for sync the operations to the std::map, and a reader thread always can execute into std::map::find() after it get a sequence number instead of being blocked when a writer thread is writing/re-balancing the tree of the std::map, then it may be crashed because of dirty data. If SeqLock is for protecting the consistency of BigStruct, then I have to put the checking codes everywhere in the consumer. Thank you anyway!!! – Leon Nov 25 '20 at 10:02
  • @Leon Obviously, seqlock is useless for non-trivial data structures that can be transiently in inconsistent state while being updated and hence cannot be read concurrently during that time. – Maxim Egorushkin Nov 25 '20 at 10:07
1

I suggest you make your BigStructs effectively immutable, and store them like this:

std::map<std::string, std::shared_ptr<BigStruct_2> > all_structs2;

In order to make an update, the writer-thread would allocate a new BigStruct_2 (using std::make_shared or whatever), set it equal to the object it wants to modify, then modify the new BigStruct_2, then insert the new shared_ptr<BigStruct_2> into the map.

Then whenever a reader-thread wants to read a BigStruct_2 it can do so safely by holding a local std::shared_ptr<BigStruct_2> that it retrieved from the std::map. Since your writer-thread is careful to never modify a BigStruct_2 that it has already inserted it into the map, there is no chance that a reader-thread will be wrong-footed by a BigStruct_2's contents changing while it is in the process of reading it.

Note that none of the above makes accesses to the std::map itself thread-safe, so you'd still need to synchronize all of those with a mutex. Those actions should be quite fast, though, so I doubt that will be a problem.

Jeremy Friesner
  • 70,199
  • 15
  • 131
  • 234
  • I have processed the race condition of the map self, by adding a mutex. For simplicity, I omitted it. Thank you very much! – Leon Nov 16 '20 at 06:05
1

std::atomic<BigStruct_2> will effectively be using lock: is_always_lock_free and is_lock_free are very likely to be false on aly implementation.

Also, you cannot use non-trivially-copyable type in std::atomic anyway (i.e. no std::string in your struct).

Generally, the ability of std::atomic to work with types of any size is for portability (in case if HW does not support atomics of particular size, but some does). It is not meant to be used in never-lock-free case.

The approach with data_ok is effectively lock-based either. You'll need to retry when data_ok is false.

You'd better just use std::mutex to guard data, or, if reads are prevalent, use std::shared_mutex to allow concurrent reads.

Alex Guteniev
  • 12,039
  • 2
  • 34
  • 79
0

If you're looking to absolutely minimise the lock time, I would suggest a third approach - you have a map of items which are structs holding an atomic pointer to the current values, as opposed to a map of shared pointers.

The advantage of doing this is that you don't have to lock the map to update values; you only need to lock the map for additions and removals.

Locking the map is effectively a domain lock which would block all threads, when most of the time you only really need to block any thread reading the map entry under interest.

You effectively perform "copy-on-write" with the data to modify it. So you have

struct Single_t {
   size_t idx;
   double val;
}
using Details_t = std::vector<Single_t>;

struct BigStruct_Container {
   std::string      id;       //maybe put this in the digest if immutable?
   size_t           count;
   Details_t        details;
};

struct BigStruct_Digest {
   std::atomic<BigStruct_Container*> current_value;
};
std::map<std::string, BigStruct_Digest> all_digests;

With this, to perform an update you take a copy of the data and generate a new container. Then you simply update the container pointer (digest) that's in the map. Since you're not changing the value in the map, the map remains constant so there is no need to lock the map when updating.


A word of warning with using shared-ptrs; you need to use a mutex to protect the reference counting mechanism within since it is not thread-safe. If you're using domain locks that should be OK, but it's not an optimal solution.


Note that given the map doesn't contain much data, you could even perform copy-on-write with the map, and have an atomic-ptr to the map itself.....


This is along the lines of "lock individual members, not the collection" in the book "Seven Concurrency Models in Seven Weeks" - https://www.amazon.co.uk/Seven-Concurrency-Models-Weeks-Programmers/dp/1937785653

Den-Jason
  • 2,395
  • 1
  • 22
  • 17
  • The reference-counters used by `shared_ptr` use `std::atomic` functionallity and are therefore thread-safe without requiring a mutex (although of course the data the shared_ptr points to is not made thread-safe simply because a shared_ptr points to it). – Jeremy Friesner Nov 23 '20 at 21:10
  • Nope - I've fallen victim to this myself. See https://stackoverflow.com/a/47117985/1607937 – Den-Jason Nov 23 '20 at 21:12
  • They have fixed this in C++20 with `atomic_shared_ptr` I believe – Den-Jason Nov 23 '20 at 21:12
  • 2
    I think @WhizTim's comment at your link contains the critical distinction; i.e. it's not safe to have two threads accessing the same `shared_ptr` simultaneously, but it is safe to have two `shared_ptr`'s, each accessed by a different thread, both simultaneously referencing the same object. – Jeremy Friesner Nov 23 '20 at 21:16