I'm using a lock free stack (via tagged pointers) to manage a pool of small blocks of memory. The list nodes are created and destroyed in-place when the blocks are inserted into, and removed from, the pool.
This is a very simplified test program, which only pops from the stack. So, no ABA problem and no tagged pointers. It is sufficient to demonstrate the race I'm running into:
#include <atomic>
#include <list>
#include <thread>
#include <type_traits>
struct Node {
Node() = default;
Node(Node *n) { next.store(n); }
std::atomic<Node *> next;
};
using Memory = std::aligned_storage_t<sizeof(Node)>;
struct Stack {
bool pop_and_use() {
for (Node *current_head = head.load(); current_head;) {
Node *next = current_head->next.load(); // READ RACE
if (head.compare_exchange_weak(current_head, next, std::memory_order_seq_cst)) {
current_head->~Node();
Memory *mem = reinterpret_cast<Memory *>(current_head);
new (mem) int{0}; // use memory with non-atomic write (WRITE RACE)
return true;
}
}
return false;
}
void populate(Memory *mem, int count) {
for (int i = 0; i < count; ++i) {
head = new (mem + i) Node(head.load());
}
}
std::atomic<Node *> head{};
};
int main() {
Memory storage[10000];
Stack test_list;
test_list.populate(storage, 10000);
std::thread worker([&test_list]() {
while (test_list.pop_and_use()) {
};
});
while (test_list.pop_and_use()) {};
worker.join();
return 0;
}
Thread sanitizer reports the following error:
clang++-10 -fsanitize=thread tsan_test_2.cpp -o tsan_test_2 -O2 -g2 -Wall -Wextra && ./tsan_test_2
LLVMSymbolizer: error reading file: No such file or directory
==================
WARNING: ThreadSanitizer: data race (pid=35998)
Atomic read of size 8 at 0x7fff48bd57b0 by thread T1:
#0 __tsan_atomic64_load <null> (tsan_test_2+0x46d88e)
#1 std::__atomic_base<Node*>::load(std::memory_order) const /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/atomic_base.h:713:9 (tsan_test_2+0x4b3e6c)
#2 std::atomic<Node*>::load(std::memory_order) const /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/atomic:452:21 (tsan_test_2+0x4b3e6c)
#3 Stack::pop_and_use() /home/BOSDYN/akhripin/tmp/tsan_test_2.cpp:17:39 (tsan_test_2+0x4b3e6c)
#4 main::$_0::operator()() const /home/BOSDYN/akhripin/tmp/tsan_test_2.cpp:40:22 (tsan_test_2+0x4b3e6c)
#5 void std::__invoke_impl<void, main::$_0>(std::__invoke_other, main::$_0&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/invoke.h:60:14 (tsan_test_2+0x4b3e6c)
#6 std::__invoke_result<main::$_0>::type std::__invoke<main::$_0>(main::$_0&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/invoke.h:95:14 (tsan_test_2+0x4b3e6c)
#7 decltype(std::__invoke(_S_declval<0ul>())) std::thread::_Invoker<std::tuple<main::$_0> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/thread:244:13 (tsan_test_2+0x4b3e6c)
#8 std::thread::_Invoker<std::tuple<main::$_0> >::operator()() /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/thread:253:11 (tsan_test_2+0x4b3e6c)
#9 std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::$_0> > >::_M_run() /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/thread:196:13 (tsan_test_2+0x4b3e6c)
#10 <null> <null> (libstdc++.so.6+0xbd6de)
Previous write of size 4 at 0x7fff48bd57b0 by main thread:
#0 Stack::pop_and_use() /home/BOSDYN/akhripin/tmp/tsan_test_2.cpp:21:9 (tsan_test_2+0x4b3d5d)
#1 main /home/BOSDYN/akhripin/tmp/tsan_test_2.cpp:43:20 (tsan_test_2+0x4b3d5d)
Location is stack of main thread.
Location is global '??' at 0x7fff48bad000 ([stack]+0x0000000287b0)
Thread T1 (tid=36000, running) created by main thread at:
#0 pthread_create <null> (tsan_test_2+0x4246bb)
#1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0xbd994)
#2 __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310 (libc.so.6+0x21b96)
SUMMARY: ThreadSanitizer: data race (/home/BOSDYN/akhripin/tmp/tsan_test_2+0x46d88e) in __tsan_atomic64_load
==================
ThreadSanitizer: reported 1 warnings
The problem arises when the two threads read the same value of current_head
, but one of them completes the pop and overwrites the node before the other has a chance to read current_head->next
.
This is similar to the problem discussed here: Why would 'deleting' nodes in this lock-free stack class would cause race condition? except the memory is not actually being deallocated.
I know that from the machine's perspective, this race is benign -- if the read race occurs, the compare-and-swap will not succeed -- but I think this is still getting into undefined behavior territory in C++.
- Is there any way to write this code without getting a race condition?
- Is there any way to annotate the code to make thread sanitizer ignore it? I experimented with
__tsan_acquire
and__tsan_release
but could not find something that consistently worked.
Update I'm pretty convinced that there is no way to perform the atomic read safely in standard C++ -- the object just doesn't exist any more. But -- can I go from relying on undefined behavior to relying on implementation-defined behavior? What's the best I could do, given typical architectures and toolchains (x86/ARM, gcc/clang)?
Update 2 One implementation-specific approach that seems to work is to replace the load with inline assembly:
inline Node *load_next_wrapper(Node *h) {
Node *ret;
asm volatile("movq (%1), %0" : "=r"(ret) : "r"(&h->next));
return ret;
}
This is both architecture and compiler specific -- but I think this does replace "undefined" behavior with "implementation-defined" behavior.