23

I've read on Stackoverflow that none of the STL containers are thread-safe for writing. But what does that mean in practice? Does it mean I should store writable data in plain arrays?

I expect concurrent calls to std::vector::push_back(element) could lead to inconsistent data structures becaue it might entail resizing the vector. But what about a case like this, where resizing is not involved:

  1. using an array:
int data[n];
// initialize values here...

#pragma omp parallel for
for (int i = 0; i < n; ++i) {
    data[i] += func(i);
}
  1. using a `std::vector``:
std::vector<int> data;
data.resize(n);
// initialize values here...
    
#pragma omp parallel for
for (int i = 0; i < n; ++i) {
    data[i] += func(i);
}

Is the first implementation really better than the second one a) in terms of thread-safety and b) in terms of performance? I would prefer to use a std::vector, since I am less comfortable with C-style arrays.

EDIT: I removed a #pragma omp atomic update protecting the write.

Mohsin
  • 536
  • 6
  • 18
clstaudt
  • 21,436
  • 45
  • 156
  • 239
  • 1
    I am not certain enough to make it an answer, but I'm pretty sure writing to different elements of a `std::vector` is thread-safe. – Angew is no longer proud of SO Dec 19 '12 at 15:31
  • 1
    These two snippets are equally thread safe. – Andreas Brinck Dec 19 '12 at 15:33
  • "But what does that mean in practice?" : It means a container must be latched exclusively for both write *and* read if any/either operation coincides with a concurrent **write**. You can have all the readers banging on a container you want, but as soon as a write *potential* is introduced all bets are off and you must latch-down *all* access (not just other writers). A single-write-multi-read lock works well for this, btw. – WhozCraig Dec 19 '12 at 16:17

4 Answers4

27

The two are equally safe. Provided no element is accessed from multiple threads you're OK. Your parallel loop will access each element only once, and hence only from one thread.

There's space in the standard for the member functions of containers to be non-thread-safe. In this case you use vector<int>::operator[], so you'd want an explicit guarantee of thread-safety for that member, which seems reasonable since calling it even on a non-const vector doesn't modify the vector itself. So I doubt that there's a problem in this case, but I haven't looked for the guarantee [edit: rici found it]. Even if it's potentially unsafe, you could do int *dataptr = &data.front() before the loop and then index off dataptr instead of data.

As an aside, this code is not guaranteed safe for vector<bool>, since it's a special-case for which multiple elements co-exist inside one object. It would be safe for an array of bool, since the different elements of that are different "memory locations" (1.7 in C++11).

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
  • I wrote a small test function to see if openmp was functioning correctly. I used `vector`, resized it to the number of threads used (initialized to false), and set all values to true concurrently. If every value was true, then the function returned true to show that all threads were spawning correctly. `vector` has struck again. Luckily, I actually read the relevant part of the standard recently so I was able to figure it out, but if anyone stumbles across this, they need to make sure they avoid concurrent writes to `vector`. – JustinBlaber Jul 02 '15 at 20:06
  • going by this, it should be safe to push_back to individual vectors under `std::vector>` from each corresponding separate thread? – Abhinav Gauniyal Apr 05 '17 at 02:18
20

For c++11, which specifies rules for data races, the thread-safety of containers is described. A relevant section of the standard is § 23.2.2, paragraph 2:

Notwithstanding (17.6.5.9), implementations are required to avoid data races when the contents of the contained object in different elements in the same sequence, excepting vector<bool>, are modified concurrently.

[ Note: For a vector<int> x with a size greater than one, x[1] = 5 and *x.begin() = 10 can be executed concurrently without a data race, but x[0] = 5 and *x.begin() = 10 executed concurrently may result in a data race. As an exception to the general rule, for a vector<bool> y, y[0] = true may race with y[1] = true. —end note ]

The mentioned § 17.6.5.9 essentially bans any concurrent modification by any standard library interface unless specifically allowed, so the section I quote tells you exactly what's allowed (and that includes your usage).

Since the question was raised by Steve Jessop, paragraph 1 of § 23.2.2 explicitly allows the concurrent use of [] in sequence containers:

For purposes of avoiding data races (17.6.5.9), implementations shall consider the following functions to be const: begin, end, rbegin, rend, front, back, data, find, lower_bound, upper_bound, equal_range, at and, except in associative or unordered associative containers, operator[].

Community
  • 1
  • 1
rici
  • 234,347
  • 28
  • 237
  • 341
3

The main thing it means is that if you have multiple threads accessing the vector, you can't depend on C++ to keep you from corrupting the data structure with multiple concurrent writes. So you need to use some kind of guard. On the other hand, if your program doesn't use multiple threads, as your examples don't seem to, you're perfectly fine.

Charlie Martin
  • 110,348
  • 25
  • 193
  • 263
  • 2
    Upvoted to 0. This was probably downvoted as incorrect but was posted before the edit that removed: #pragma omp atomic update, which does prevent concurrent writing and makes this answer correct in the context of the original post –  Jul 29 '13 at 02:53
1

In this case you should just construct your vector with necessary number of values? and all will work fine.

std::vector<int> data(n, 0);

resize() works well to. The performance will be equal. The reason why multithread access will not corrupt the vector is: your data is located at its place and will not move from there. OMP threads will not access to the same element at a time.

Jurlie
  • 1,014
  • 10
  • 27