3

I have a vector (a large buffer of strings) which I hope to read from a network thread while it is being written to by the main thread.

As vector is not thread-safe, the typical approach would be to lock the access to it from both threads.

However my insight has been that this should be only strictly necessary when the particular write will resize the vector, which will cause the realloc, which will cause any concurrent readers to have their carpet yanked. At all other times, provided that the concurrent reader is not reading the entry currently being written to, reading and writing to the vector should be allowed to happen concurrently.

So, I'd want to (somehow) only lock the mutex from the write thread when I know that the write I am about to do will cause a resize operation that will realloc the vector.

Is this practical? Can I deduce based on the value returned by capacity and the current size, for certain whether or not the next push_back will resize?

Steven Lu
  • 41,389
  • 58
  • 210
  • 364
  • Yes, `capacity()` can be used to control reallocation, that is guaranteed. – Kerrek SB Mar 20 '14 at 18:37
  • You're probably going to have to consider the consequences of your reader reading at the same time your writing is writing **the same element** . – WhozCraig Mar 20 '14 at 18:38
  • Okay I guess the best way to go would be a little unit test suite that checks pointer values to verify the behavior of resizes – Steven Lu Mar 20 '14 at 18:38
  • @WhozCraig Yes I specified that caveat in the OP – Steven Lu Mar 20 '14 at 18:39
  • @StevenLu oks. so long as you know thats a *big* caveat. – WhozCraig Mar 20 '14 at 18:39
  • Not too hard to deal with if I e.g. keep a `size_t last_written` variable whose index the reader never reads past and which the writer only fills in after it finishes writing to that index. – Steven Lu Mar 20 '14 at 18:41
  • 2
    @StevenLu certainly, and I assume *that* (`last_written`) is interlocked on read and write, because otherwise you just moved your problem. – WhozCraig Mar 20 '14 at 18:43
  • Would that still hold if I changed it to say "whose index the reader never reaches"? – Steven Lu Mar 20 '14 at 18:48
  • @WhozCraig if there is only one thread writing to `last_written`, then it shouldn't be an issue. While `last_written` is being updated, other threads will either read in its old value, or its new value, as `size_t` is atomic. – Red Alert Mar 20 '14 at 18:53
  • What happens if the writer erases something which causes elements to be copied into new positions within the vector? You really need to look through every single member function in order to determine which methods might not only grow the vector, but which ones will cause elements to be copied or moved. Also, how do you know that simply locking the mutex will cause a performance issue? Perhaps you'd be better of taking the safer route until such time that you have proven that the performance degradation prevents your software from meeting requirements. – shawn1874 Mar 20 '14 at 18:56
  • 1
    @RedAlert really? its *atomic* ? ok now I' curious. where in the C++ 11 standard is `size_t` defined as implementation-independent *atomic* ? I'm genuinely curious. And how does the compiler know this? I.e. what is to prevent the optimizer from caching the initial value and, upon observing no modification to it in the reader-loop, simply optimizing out any actual *read* refresh? (and a thousand cuts unto anyone that says the word "volatile" in this discussion). And if it were truly atomic, the "if there is only one thread reading" wouldn't be a limiter in the first place. – WhozCraig Mar 20 '14 at 18:57
  • @StevenLu It doesn't have anything to do with what index the reader is reading from, you have the potential for concurrent access to `last_written`, which means the access needs to be synchronized, other you have a data race and undefined behavior. Don't listen to RedAlert, there's no guarantee that `size_t` access is atomic. – Praetorian Mar 20 '14 at 19:00
  • @Praetorian i concur (in case it wasn't obvious). – WhozCraig Mar 20 '14 at 19:01
  • You're right, he should use `atomic_size_t` to be safe – Red Alert Mar 20 '14 at 19:02
  • @shawn1874 good points. Also thanks for clarification on potential non-atomicity of `size_t`. I was unaware. – Steven Lu Mar 20 '14 at 19:03
  • @StevenLu its *not* atomic. Does your implementation support `std::atomic<>` and all its goodies? If so, use that for latching read/write to your top-end. The rest of your design seems worth pursuing. And don't let the atomic scare you. I can all-but guarantee you it will be far more efficient than a full-blown mutex latching down your entire collection, which is somewhat the point of avoidance in all of this. (and +1 for the interesting brain-food) – WhozCraig Mar 20 '14 at 19:04
  • Thanks, I haven't even dug deep enough to figure out how to use std::atomic. I just figured locking on all reads from the vector is absurd. – Steven Lu Mar 20 '14 at 19:08
  • @StevenLu its *awesome*. So much nicer than trying to manage it using implementation-dependent intrinsics. – WhozCraig Mar 20 '14 at 19:09
  • Neat. Also, turns out I want to FIFO this data so now it's a std::deque. Hopefully that doesn't change things too much... crap. it might. – Steven Lu Mar 20 '14 at 19:10
  • Well now. [hmmm...](http://stackoverflow.com/q/3158379/340947) Either it just got a lot easier to deal with (no locking required?) or a lot harder. I might stick with the vector and periodically manually truncate it from the front. – Steven Lu Mar 20 '14 at 19:13
  • and why not to use std::list? imho this is solution for your problem - overhead would be minor - as You said you are storing strings anyway, reading and writing new and taking old from container, then list is the way to go, vector and deque have to resize/move data at some point, lists don't – Piotr Tkaczyk Mar 20 '14 at 20:23
  • other choice would be to use cyclic buffer for such task, check this solution, maybe it will help http://www.boost.org/doc/libs/1_55_0/doc/html/circular_buffer.html – Piotr Tkaczyk Mar 20 '14 at 20:56
  • Ah yes... I may just use a `std::list`... – Steven Lu Mar 20 '14 at 21:01
  • 2
    Uncontended locks are very cheap. And if your lock is contended - well, that means you really need it. Most attempts to avoid locks are premature optimisation, with hard to diagnose bugs as a free bonus. – Alan Stokes Mar 20 '14 at 21:38
  • @AlanStokes Are you saying to try it first and when the race condition sh*t hits the fan only then do I start putting in thread synchronization??? That seems like an absurd way to go about coding something. However if you are simply saying to lock everything and then only bother with the optimization if the lock is contended often, then yes. Quite sound advice... Anyways, I switched to using a `std::list` instead of a `std::vector` for my application, and since the iterators won't get invalidated, the consumer (which reads) basically never has to acquire a lock (and'll never be blocked). Win! – Steven Lu Mar 20 '14 at 22:21
  • @AlanStokes BUT! I think I have a slight disagreement with your statement. The overzealous lock CAN be highly contended, however, during all this time it can be slowing down two threads which are performing tasks that can be permitted to occur concurrently! (that was the reasoning of this entire question to begin with) For example, most writes would not cause the vector to resize. Any and all reads from (other elements of) the vector can occur concurrently with such non-resizing writes. – Steven Lu Mar 20 '14 at 22:29
  • 1
    @StevenLu Using `std::list` doesn't magically solve your problem. `list::size` is required to be constant time from C++11 onward, which means both the producer and consumer will be accessing the internal `size` data member when pushing and popping elements from the list. Don't use some kind of synchronization and you have undefined behavior. Anyway, this is my last comment on this question, since you seem all too keen on ignoring everyone's advice. Bottom line is, you **need** some lock somewhere whenever two or more threads are accessing shared memory. – Praetorian Mar 21 '14 at 00:14
  • Also, if you can use Boost, [boost::lockfree::spsc_queue](http://www.boost.org/doc/libs/1_55_0/doc/html/boost/lockfree/spsc_queue.html) sounds like it fits your use case perfectly. – Praetorian Mar 21 '14 at 00:18
  • @Praetorian spsc_queue is a nice find, thanks! Also, I don't think "being keen on ignoring advice" is a fair assessment. The technique using an `std::atomic` to limit read access within the vector seems like a perfectly serviceable solution for my original question. That `std::list` might still break on me without locking is indeed unfortunate but i do believe we have gone off the beaten path a bit, as far as the original question is concerned – Steven Lu Mar 21 '14 at 02:17
  • At any rate, the reason I am using `list` isn't because zodi said it and I thought it was a silver bullet. It was because I realized that my original requirement only calls for FIFO functionality so I really only need to choose between list and deque (queue being a deque). However this does not change the validity of the original question which is about the minimum amount of locking required for a single writer single reader vector where the vector is being filled by pushing. – Steven Lu Mar 21 '14 at 02:24

1 Answers1

1

I am assuming you are writing into a vector by doing push_back

You can use boost shared_mutex that allows you have multiple reads and one write.

vector<string> vBuffer;
boost::shared_mutex _access;
void reader()
{
  // get shared access
  boost::shared_lock<boost::shared_mutex> lock(_access);

  // now we have shared access
}

void writer(int index, const string& data)
{
  // get upgradable access
  boost::upgrade_lock<boost::shared_mutex> lock(_access);

  if (vBuffer.capacity()<=index)//test here you can saefly write
  {
      // get exclusive access
      boost::upgrade_to_unique_lock<boost::shared_mutex> uniqueLock(lock);
      // now we have exclusive access

      vBuffer.resize(vBuffer.2);
  }
  vBuffer[i]=data; 
}

I adapated this example:

Example for boost shared_mutex (multiple reads/one write)?

Hope that helps, feel free to correct me if I am wrong

Community
  • 1
  • 1
Gabriel
  • 3,564
  • 1
  • 27
  • 49