Multithreaded read-many, write-seldom array/vector iteration in C++

Question

I have a need to almost-constantly iterate over a sequence of structs in a read-only fashion but for every 1M+ reads, one of the threads may append an item. I think using a mutex would be overkill here and I also read somewhere that r/w locks have their own drawbacks for the readers.

I was thinking about using reserve() on a std::vector but this answer Iterate over STL container using indices safe way to avoid using locks? seemed to invalidate that.

Any ideas on what way might be fastest? The most important thing is for the readers to be able to quickly and efficiently iterate with as little contention as possible. The writing operations aren't time-sensitive.

Update: Another one of my use cases is that the "list" could contain pointers rather than structs. I.e, std::vector. Same requirements apply.

Update 2: Hypothetic example

globally accessible:

typedef std::vector<MyClass*> Vector;
Vector v;
v.reserve(50);

Reader threads 1-10: (these run pretty much run all the time)

.
.
int total = 0;
for (Vector::const_iterator it = v.begin(); it != v.end(); ++it)
{
   MyClass* ptr = *it;
   total += ptr->getTotal();
}
// do something with total
.
.

Writer threads 11-15:

MyClass* ptr = new MyClass();
v.push_back(ptr);

That's basically what happens here. threads 1-15 could all be running concurrently although generally there are only 1-2 reading threads and 1-2 writer threads.

How large are those vectors? Do they have to be arrays/vectors? Do you need memory block continuity? Can you change it to i.e. list-of-elements or list-of-blocks-of-1024elements etc? — quetzalcoatl, Mar 13 '13 at 14:06
Is the number of concurrently-existing structs bounded? Are they ever removed, or mutated in-place, or do you really just read the same values over and over? How many readers are there? — Useless, Mar 13 '13 at 14:25
@quetzalcoatl The vectors range from about 2-20 items with the average being around 7. They don't need mem continuity but I think having that continuity might be preferable w.r.t cache locality. I'm open to any suggestions. — stgtscc, Mar 13 '13 at 15:05
@Useless it's not bounded but typically fall within the range specified above. Never removed, not sure what you mean by "Mutated in place?" but the structs are definitely mutable and do change. — stgtscc, Mar 13 '13 at 15:06
OK, if the size isn't bounded above, you can't reserve a vector and _guarantee_ it'll never need re-allocation. So, you need something that can grow. Could multiple "readers" try to grow the list concurrently? — Useless, Mar 13 '13 at 15:13
@stgtscc if the structs change then they need their own synchronization (independent of the synchronization you're using to change the size of the list) — SoapBox, Mar 13 '13 at 15:14
Oh, and you don't say when & how items are _removed_ from the container. Is the read operation exclusive/destructive? — Useless, Mar 13 '13 at 15:15
@Useless I don't mind adding a bounded size if it helps achieve my goal. No, only a single thread can append to the "list" — stgtscc, Mar 13 '13 at 15:21
@SoapBox The structs/pointers themselves wouldn't need synchronization in my case but thanks for pointing that out. — stgtscc, Mar 13 '13 at 15:22
One possible way: there is a readers' copy and a writers' copy of the data; readers read their copy, writers update theirs (with a lock); once in a while (say every 10k reader's iterations) the readers' copy gets updated from the writers' copy. Readers will wait until this is done. The drawback is that all readers will have to wait until the last one completes its cycle. If consistency between readers is not required at all times, then each reader can switch independently, but then you'll need more copies of the data. — n. m. could be an AI, Mar 13 '13 at 15:39
@n.m. This sounds a lot like CopyOnWriteArrayList from Java. How would you propose on implementing such a facility? Isn't just keeping track of "when" to update and the synchronization already overkill? — stgtscc, Mar 13 '13 at 15:50
You still haven't addressed removal. Is each "reader" supposed to _remove_ items from the container, or just read it in-place and let other threads read it too? _Are_ items ever removed? — Useless, Mar 13 '13 at 16:11
Reader: loop 10k times without locks, signal event A, go to looping with normal or RW mutex checking for event B each time. Writers: wait until every reader signals event A, make an update (or decide there's no update), signal event B, repeat. When readers get B, they switch back to lock-less mode. — n. m. could be an AI, Mar 13 '13 at 21:55

score 4 · Accepted Answer · answered Mar 14 '13 at 09:23

What I think could work here is own implementation of vector, something like this:

template <typename T> class Vector
{
// constructor will be needed of course
public:
    std::shared_ptr<const std::vector<T> > getVector()
        { return mVector; }
    void push_back(const T&);

private:
    std::shared_ptr<std::vector<T> > mVector;
};

Then, whenever readers need to access a specific Vector, they should call getVector() and keep the returned shared_ptr until finished reading.

But writers should always use Vector's push_back to add new value. This push_back should then check if mVector.size() == mVector.capacity() and if true, allocate new vector and assign it to mVector. Something like:

template <typename T> Vector<T>::push_back(const T& t)
{
    if (mVector->size() == mVector->capacity())
    {
        // make certain here that new_size > old_size
        std::vector<T> vec = new std::vector<T> (mVector->size() * SIZE_MULTIPLIER);

        std::copy(mVector->begin(), mVector->end(), vec->begin());
        mVector.reset(vec);
    }
// put 't' into 'mVector'. 'mVector' is guaranteed not to reallocate now.
}

The idea here is inspired by RCU (read-copy-update) algorithm. If storage space is exhausted, the new storage should not invalidate the old storage as long as there is at least one reader accessing it. But, the new storage should be allocated and any reader coming after allocation, should be able to see it. The old storage should be deallocated as soon as no one is using it anymore (all readers are finished).

Since most HW architectures provide some way to have atomic increments and decrements, I think shared_ptr (and thus Vector) will be able to run completely lock-less.

One disadvantage to this approach though, is that depending on how long readers hold that shared_ptr you might end up with several copies of your data.

PS: hope I haven't made too many embarrassing errors in the code :-)

Is this in fact *completely lockless*? If it is then it's absolutely perfect for my needs. Writes can be very expensive but reads need to be fast with the least amount of contention. — stgtscc, Mar 15 '13 at 17:49
I would like to know if this would work and that in fact no locking is necessarily within the push_back() method. — stgtscc, Mar 15 '13 at 17:59
@stgtscc: I'm pretty sure it is lockless _if and only if_ the `shared_ptr` is using hardware-specific atomic increments and decrements. For details on that, you'll need to find out what your compiler/library do for your particular hardware. — Mooing Duck, Mar 15 '13 at 18:03
@stgtscc: I would advice you to verify this with your compiler/HW. GCC 4.6.3 on Intel produces `lock add %edx,(%rax)` to increment reference count, so, I'd assume it should be able to use lockless increments also on other platforms that support atomic increment (or load/store). — user2116939, Mar 17 '13 at 16:58

score 0 · Answer 2 · answered Mar 13 '13 at 16:56

... using reserve() on a std::vector ...

This can only be useful if you can guarantee the vector will never need to grow. You've stated that the number if items is not bounded above, so you can't give that guarantee.

Notwithstanding the linked question, you could conceivable use std::vector just to manage memory for you, but it would take an extra layer of logic on top to work around the problems identified in the accepted answer.

The actual answer is: the fastest thing to do is minimize the amount of synchronization. What the minimal amount of synchronization is depends on details of your code and usage that you haven't specified.

For example, I sketched a solution using a linked-list of fixed-size chunks. This means your common use case should be as efficient as an array traversal, but you're able to grow dynamically without re-allocating.

However, the implementation turns out to be sensitive to questions like:

whether you need to remove items
- whenever they're read?
- only from the front or from other places?
whether you want the reader to busy-wait if the container is empty
- whether this should use some kind of backoff
what degree of consistency is required?

I added a code example if it helps. Although I think the shared_ptr solution above might fit the bill. I hope the synchronization is minimized. — stgtscc, Mar 15 '13 at 19:50

Multithreaded read-many, write-seldom array/vector iteration in C++

2 Answers2