std::atomic segmentation fault on WSL Ubuntu 20.04 LTS with GCC 9.x

Question

Based on this question, I extended std::mutex so that I can check if the mutex is already locked by the current thread:

class mutex : public std::mutex
{
  public:
    void lock()
    {
      std::mutex::lock();
      lock_holder_ = std::this_thread::get_id();
    }

    void unlock()
    {
      lock_holder_ = std::thread::id();
      std::mutex::unlock();
    }
 
    [[nodiscard]] bool locked_by_caller() const
    {
      return lock_holder_ == std::this_thread::get_id();
    }

    private:
      std::atomic<std::thread::id> lock_holder_;
};

On Linux with GCC 9.x (via WSL on Windows 10, using Ubuntu 20.04 LTS), there is a segmentation fault thrown in the locked_by_caller() function. When using Valgrind I get the following output:

==2120== Memcheck, a memory error detector
==2120== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2120== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==2120== Command: ./WSL-GCC-Debug/eventbus/eventbus.tests
==2120==
==2120== error calling PR_SET_PTRACER, vgdb might block
Running main() from /mnt/d/repositories/eventbus/eventbus-src/out/build/WSL-GCC-Debug/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 4 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 3 tests from EventBusTestFixture
[ RUN      ] EventBusTestFixture.LambdaRegistrationAndDeregistration
Lambda 3 take by copy.
Lambda 1
Lambda 3 take by copy.
Lambda 1
Lambda 3 take by copy.
Lambda 3 take by copy.
[       OK ] EventBusTestFixture.LambdaRegistrationAndDeregistration (83 ms)
[ RUN      ] EventBusTestFixture.RegisterWhileDispatching
Loop 0 Handler count: 2
==2120== Invalid read of size 8
==2120==    at 0x11C036: std::atomic<std::thread::id>::load(std::memory_order) const (atomic:254)
==2120==    by 0x11AE5A: std::atomic<std::thread::id>::operator std::thread::id() const (atomic:207)
==2120==    by 0x11A22E: dp::event_bus::mutex::locked_by_caller() const (event_bus.hpp:153)

And gdb outputs the following:

Program received signal SIGSEGV, Segmentation fault.
std::atomic<std::thread::id>::load (this=0x2a, __m=std::memory_order_seq_cst) at /usr/include/c++/9/atomic:254
254     __atomic_load(std::__addressof(_M_i), __ptr, int(__m));
Segmentation fault

I noticed the invalid read of size 8 and after doing research I found that this was a bug (also on Stackoverflow) in an older version of GCC but was fixed in 5.x. Is this still an alignment problem because I'm using std::thread::id in an std::atomic? Are there any work around to this issue?

Full code (for context) is on my Github.

"*so that I can check if the mutex is already locked by the current thread:*" ... it would really be better if you just rearranged your code so that you wouldn't *have* to check. If you don't *know* if you've locked a particular mutex, it feels like you've written your code wrong. Or just use `recursive_mutex`. — Nicol Bolas, Sep 17 '20 at 14:17
@Nicol Bolas The above code is used in a event dispatcher and I'm trying to handle the case where a "bad" event callback tried to register or deregister a listener inside its event callback (which is bad). The event dispatcher map that I used needs to be thread safe, but obviously I can't call `lock()` multiple times on the same thread b/c that UB. Using a `recursive_mutex` breaks the logic of locking the map I'm using. I haven't found another solution other than the one above that works the way I want. — Developer Paul, Sep 17 '20 at 14:42
How do you use your mutex? If it's a member of some class then it likely is the alignment problem, not a bug in gcc. — 273K, Sep 17 '20 at 14:46
@S.M. It's defined inside another class privately and is used in a templated function I used internally to access the event dispatch map. Moving it's definition outside the class still results in a segmentation fault. — Developer Paul, Sep 17 '20 at 14:49
Use GCC `__attribute__ ((aligned (8)))` for your class declaration. I'm not sure this attribute is inherited. Use the `alignof` operator to determine the actual alignment. — 273K, Sep 17 '20 at 14:59
Run it under `gdb` to see the read of what address is triggering the segfault. My crystal ball tells me it's probably uninitialized memory and/or 0. — rustyx, Sep 17 '20 at 15:02
@rustyx I still only get the following output: Program received signal SIGSEGV, Segmentation fault. std::atomic::load (this=0x2a, __m=std::memory_order_seq_cst) at /usr/include/c++/9/atomic:254 254 __atomic_load(std::__addressof(_M_i), __ptr, int(__m)); — Developer Paul, Sep 17 '20 at 15:16
`this=0x2a` you are dereferencing parent nullptr. And mutex is not aligned properly inside the parent. — 273K, Sep 17 '20 at 15:21
@S.M. Maybe I'm just missing something, but I don't see where this dereference is happening? — Developer Paul, Sep 17 '20 at 15:28
@S.M.: struct / class objects do inherit the `alignof` of their most-aligned member. You don't need to manually hand-hold the compiler with `alignas()` to make aligned members work properly. If you're dynamically allocating storage for the class object, you *do* have to make sure it's sufficiently aligned, which is non-trivial if that alignment is greater than `alignof(max_align_t)`, especially before C++17. (e.g. `std::vector<__m256i>` was typically broken because older C++ sucked.) But a mutex should be fine unless a custom allocator under-aligns, less than max_align_t (at least 8). — Peter Cordes, Sep 17 '20 at 15:38
After attempting to make a minimal reproducible [example](https://www.onlinegdb.com/edit/Bkmj5BbrP), I realized that the issue was actually elsewhere. I have another callback item that was using another atomic inside of it as a counter. This object was going out of scope and then being used later which seems to have been calling the segfault. I have since fixed the issue. — Developer Paul, Sep 17 '20 at 20:46

std::atomic segmentation fault on WSL Ubuntu 20.04 LTS with GCC 9.x

0 Answers0