Thread sanitizer warnings after using parallel std::for_each

Question

I created the following simple test program to see how parallel execution works with std::for_each.

#include <iostream>
#include <vector>
#include <execution>

int main(int ac, char**av){
    constexpr int size=5;
    std::vector<int> v;
    std::vector<int> expected;
    for(int i=0; i<size; ++i) v.push_back(i);
    expected.resize(size);

    std::for_each(std::execution::par, v.begin(), v.end(), [&](auto x){ expected[x]=x; });
    auto eq = std::equal(v.begin(), v.end(), expected.begin());
    std::cout << "Compare: "<<eq<<"\n";
    return 0;
}

The program runs without any problem, however if I link it with thread sanitized I get data race warnings. Here is the program output:

Compare: 1
==================
WARNING: ThreadSanitizer: data race (pid=47090)
  Write of size 8 at 0x7fab5399b200 by thread T8:
    #0 memset /tp_src/gcc-9.2.0/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:762 (libtsan.so.10+0x35bc5)
    #1 memset /tp_src/gcc-9.2.0/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:760 (libtsan.so.10+0x35bc5)
    #2 rml::internal::BootStrapBlocks::allocate(rml::internal::MemoryPool*, unsigned long) ../../src/tbbmalloc/frontend.cpp:888 (libtbbmalloc.so.2+0x13700)

  Previous write of size 8 at 0x7fab5399b200 by thread T10:
    #0 memset /tp_src/gcc-9.2.0/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:762 (libtsan.so.10+0x35bc5)
    #1 memset /tp_src/gcc-9.2.0/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:760 (libtsan.so.10+0x35bc5)
    #2 rml::internal::BootStrapBlocks::allocate(rml::internal::MemoryPool*, unsigned long) ../../src/tbbmalloc/frontend.cpp:888 (libtbbmalloc.so.2+0x13700)

  Thread T8 (tid=47099, running) created by thread T4 at:
    #0 pthread_create /tp_src/gcc-9.2.0/libsanitizer/tsan/tsan_interceptors.cc:964 (libtsan.so.10+0x3057b)
    #1 rml::internal::thread_monitor::launch(void* (*)(void*), void*, unsigned long) ../../src/tbb/../rml/server/thread_monitor.h:218 (libtbb.so.2+0x20ab8)
    #2 tbb::internal::rml::private_worker::wake_or_launch() ../../src/tbb/private_server.cpp:297 (libtbb.so.2+0x20ab8)
    #3 tbb::internal::rml::private_server::wake_some(int) ../../src/tbb/private_server.cpp:395 (libtbb.so.2+0x20ab8)

  Thread T10 (tid=47101, running) created by thread T3 at:
    #0 pthread_create /tp_src/gcc-9.2.0/libsanitizer/tsan/tsan_interceptors.cc:964 (libtsan.so.10+0x3057b)
    #1 rml::internal::thread_monitor::launch(void* (*)(void*), void*, unsigned long) ../../src/tbb/../rml/server/thread_monitor.h:218 (libtbb.so.2+0x20ab8)
    #2 tbb::internal::rml::private_worker::wake_or_launch() ../../src/tbb/private_server.cpp:297 (libtbb.so.2+0x20ab8)
    #3 tbb::internal::rml::private_server::wake_some(int) ../../src/tbb/private_server.cpp:395 (libtbb.so.2+0x20ab8)

SUMMARY: ThreadSanitizer: data race ../../src/tbbmalloc/frontend.cpp:888 in rml::internal::BootStrapBlocks::allocate(rml::internal::MemoryPool*, unsigned long)       [83/16871]
==================

It looks like for_each has completed and comparison in the end have succeeded. However some background threads make thread sanitizer unhappy during completion of main.

Are there any issues with this example or this is a bug or a false positive warning in thread sanitizer and I can ignore it?

Here is how I compile it:

g++ -O3 -I$TBBINC -std=c++17 -fsanitize=thread  ForEach.cpp $TBBLIB/libtbb.so -o ForEach

I don't think you are allowed to modify other elements inside UnaryFunction except the element you are given. — bolov, Dec 05 '21 at 00:05
I do not think that's the issue. Also even if I make this lambda empty, I still have the same warnings. — Dmitriy Kumshayev, Dec 05 '21 at 00:09
In that case, it might make sense to make the body of the lambda empty in your snippet. Also, do you still get a warning if the lambda in for_each doesn't capture by reference? — cigien, Dec 05 '21 at 01:27
yes, if lambda is empty and does not capture anything I still get the warnings. — Dmitriy Kumshayev, Dec 05 '21 at 02:14
Probably something internal to the parallel foreach implementation, then. Not much to do aside from filing a bug report if there isn't one already. — Shawn, Dec 05 '21 at 06:32
Indeed there are reported issues with tbb with thread sanitizer : https://github.com/oneapi-src/oneTBB/issues/358 — Dmitriy Kumshayev, Dec 05 '21 at 18:17
The one question I have here is whether there is implicit synchronization before and after the `for_each`. For instance, does the `expected.resize(size)` in the main thread *happen-before* all of the stores to `expected.x` in the possible worker threads? And do those stores in turn *happen-before* the loads of `expected` in the main thread's `std::equal` below? I would think they are supposed to, but I can't immediately find where the standard says so. But if they do then I can't see any race at all. — Nate Eldredge, Dec 08 '21 at 13:20
@bolov: The UnaryFunction doesn't modify any element of `v`. It modifies elements of the unrelated `expected` vector, and that is allowed as far as I know, provided that data races are avoided. The C++20 standard even includes an example where the UnaryFunction modifies an outside object (algorithms.parallel.exec p6, third example). — Nate Eldredge, Dec 08 '21 at 16:05
I asked a new question about synchronization of `for_each` with the surrounding code: https://stackoverflow.com/questions/70278724/do-parallel-algorithms-like-for-each-synchronize-with-surrounding-code — Nate Eldredge, Dec 08 '21 at 16:48

score 0 · Answer 1 · answered Dec 08 '21 at 07:29

TL;DR: There is a data-race. The error message is senseless to me, however.

std::foreach guarantees order of execution.
std::vector is not thread-safe. int& could not be less thread-safe if you tried.
There is no mutex lock. There is no atomic. So, no sequential consistency can be guaranteed.

A visible side effect A on a scalar object or bit-field M with respect to a value computation B of M satisfies the conditions:
— A happens before B and
— there is no other side effect X to M such that A happens before X and X happens before B.
The value of a non-atomic scalar object or bit-field M , as determined by evaluation B, shall be the value stored by the visible side effect A.
[ Note: If there is ambiguity about which side effect to a non-atomic object or bit-field is visible, then the behavior is either unspecified or undefined. — end note ]
[ Note: This states that operations on ordinary objects are not visibly reordered. This is not actually detectable without data races, but it is necessary to ensure that data races, as defined below, and with suitable restrictions on the use of atomics, correspond to data races in a simple interleaved (sequentially consistent) execution. — end note ]

See: [intro.multithread]

Aside: You have three non-atomic loads, two non-atomic stores.
Two loads happen before the first store. Those two loads, on x86-64 may be rearranged. They both fetch the same segment of non-attomic, unlocked, memory.

Thankfully, your algorithm would only have defined behavior if it was an implementation of std::iota with quadratic time complexity.

int main() {
  constexpr auto kSize = std::size_t(5);
  const auto expected = [] (std::size_t length) {
    auto self = std::vector<int>(length);
    std::iota(self.begin(), self.end(), 0);
    return self;
  } (kSize);
  // Now the raw loop… (Can't be bothered to use a lambda this time)
  auto actual = std::vector<int>(kSize);
  for (auto i = std::size_t(0); i != kSize; ++i)
    actual[i] = i; // No UB. Single thread. 

  std::cout << "Compare: " << (actual == expected);
  std::endl(std::cout);
}

If the value of kSize is not exactly representable by an int the behavior is not my problem.
Note: untested code. If it doesn't work: replace with std::puts("Compare: 1");

I'm not sure I understand what race you are seeing. Are you thinking that the accesses to `v` and `expected` within the lambda are racing with the accesses in the main thread before and after the `for_each`? I certainly don't see any data race happening between the worker threads themselves, as they all store to distinct elements. — Nate Eldredge, Dec 08 '21 at 15:48
Agree with the previous comment. as different threads access different elements of vector, i.e. there is no memory contention. Also as I mentioned in previous comments that even if lambda is completely empty (no body, no capture), I still get the same warnings. — Dmitriy Kumshayev, Dec 10 '21 at 10:12
I also agree `v` is not the problem here. The problem is the vector that isn't getting locked but gets modified. `expected`. It should be locked before assigning to one of its elements. I'm not sure if the behavior is defined/specified. Whilst `x` cannot change, `expected[x]` could have changed. I don't think it's determinedly sequence. The value computation is sequenced-before, not the side effects. I agree: no race-condition, only a data-race because the same, non-atomic, unlocked, object is modified by multiple threads. See: https://en.cppreference.com/w/cpp/algorithm/execution_policy_tag_t — viraltaco_, Dec 11 '21 at 14:06

Thread sanitizer warnings after using parallel std::for_each

1 Answers1

Linked