11

I have a program, which is using thread_local std::shared_ptr to manage some objects that are mainly accessed thread-locally. However when the thread is joined and the thread local shared_ptr is destructing, there is always SIGSEGV when debugging if the program is compiled by MinGW (Windows 10). Here is a minimum code to reproduce the bug:

// main.cpp
#include <memory>
#include <thread>

void f() {
    thread_local std::shared_ptr<int> ptr = std::make_shared<int>(0);
}

int main() {
    std::thread th(f);
    th.join();
    return 0;
}

How to compile:

g++ main.cpp -o build\main.exe -std=c++17

Compiler version:

>g++ --version
g++ (x86_64-posix-seh-rev2, Built by MinGW-W64 project) 12.2.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Run using gdb it will give SIGSEGV in new thread, when the main thread is waiting for join(). It works fine when compiled by gcc, clang (Linux) and MSVC (Windows).

I tried to debug and found that, a continuous segment of memory containing the thread local shared_ptr was erased to repeated 0xfeeefeee before destruction when calling RtlpWow64SetContextOnAmd64. The frames:

RtlpWow64SetContextOnAmd64 0x00007ffd8f4deb5f
RtlpWow64SetContextOnAmd64 0x00007ffd8f4de978
SbSelectProcedure 0x00007ffd8f4ae2e0
CloseHandle 0x00007ffd8ce3655b
pthread_create_wrapper 0x00007ffd73934bac
_beginthreadex 0x00007ffd8e9baf5a
_endthreadex 0x00007ffd8e9bb02c
BaseThreadInitThunk 0x00007ffd8ec87614
RtlUserThreadStart 0x00007ffd8f4c26a1

The assembly:

...
mov    %rax,(%rdi)
movdqu %xmm0,(%rsi)               ; <------ erased here
call   0x7ffd8f491920             ; <ntdll!RtlReleaseSRWLockShared>
mov    $0x1,%r9d
mov    0x30(%rsp),%rbx
...

later the shared_ptr is destructed, and when reading 0xfeeefeee there is SIGSEGV.

I want to know that:

  • Why MinGW (or Windows library?) is erasing the thread local storage before destruction? In my opinion erasing memory should only happen after the destruction. I notice that if join() is replaced by detach(), the program exits normally. Maybe join() did something to instruct the new thread to erase the storage?
  • Is such behavior a violation of standard? I think the standard should forbid erasing the memory before destruction. Please correct me if I'm mistaken.
Aamir
  • 1,974
  • 1
  • 14
  • 18
Antares
  • 113
  • 6
  • 2
    Mixing `thread_local` with a local variable is unusual use case. I guess `ptr` is destroyed at `f` end. I suggest moving the variable to the global scope. On the other hand, it implies a static local variable: https://stackoverflow.com/a/22794640/6752050 – 273K Jan 22 '23 at 18:06
  • Maybe of topic but 0xfeeefeee denotes a previously deallocated block. (https://en.wikipedia.org/wiki/Magic_number_(programming)#Debug_values). – Pepijn Kramer Jan 22 '23 at 18:06
  • As said having a thread global variable as local seems strange, but also the fact that you want to make something that is threadlocal shared. Since by definition a thread local variable should share its lifecycle with the lifecycle of the thread. And should not shareable by other threads. So what are you trying to do? – Pepijn Kramer Jan 22 '23 at 18:08
  • @273K It doesn't work. ```thread_local std::shared_ptr ptr; void f() { ptr = std::make_shared(0); } ``` will produce SIGSEGV when reading the same address `0xfeeefeee` – Antares Jan 22 '23 at 18:10
  • What mingw do you use? From msys2? – 273K Jan 22 '23 at 18:11
  • I can't reproduce on MSYS2 GCC (the MINGW64 one). Try it. – HolyBlackCat Jan 22 '23 at 18:12
  • @PepijnKramer In fact I'm using `std::shared_ptr` to control some memory allocation, but it may be off-topic here. I'm more curious about why this is happening. – Antares Jan 22 '23 at 18:15
  • Try run `gcc -v` or `g++ -v` and ensure the flag `--enable-tls` is set. – 273K Jan 22 '23 at 18:17
  • @273K Mingw-builds from https://www.mingw-w64.org/downloads/ , github page: https://github.com/niXman/mingw-builds-binaries/releases – Antares Jan 22 '23 at 18:18
  • 1
    Prefer using msys2 https://www.msys2.org/ – 273K Jan 22 '23 at 18:21
  • @273K thank you I'll try later, it's late at night me here. – Antares Jan 22 '23 at 18:28
  • It is somewhat strange : From what I can make out (https://en.cppreference.com/w/cpp/memory/monotonic_buffer_resource) your sope local variable will automatically also become a static variable and share the liftime with that of the thread. Since this works in the two other compilers, this could be a bug. Have you checked the gcc bug list (I don't know where that is I usually use msvc, or sometimes clang) – Pepijn Kramer Jan 22 '23 at 18:38
  • 2
    Try on godbolt.org with different compiler versions, maybe if you select a newer one then the one you are using it is gone. – Pepijn Kramer Jan 22 '23 at 18:39
  • Doesn't compile on gcc 4.7.4, but from 4.8.1 it does compile and I see no issue : https://godbolt.org/z/qEGq4e9Pv. Side not, consider using std::make_unique (not related to this problem) – Pepijn Kramer Jan 22 '23 at 18:51
  • 2
    btw, I can reproduce this problem in Win 11, g++ (Rev6, Built by MSYS2 project) 12.2.0. It shown as ```Thread 5 received signal SIGSEGV, Segmentation fault. [Switching to Thread 15196.0x27dc] 0x00007ff7e54133f4 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() () (gdb) x/i $rbp 0xfb957ff4b0: loopne 0xfb957ff4a6``` – Eritque arcus Jan 22 '23 at 19:20
  • Just an opinion, not an answer to your question, but I think you should shun `thread_local`. It's wrong for at least two reasons. First of all, `thread_local` only makes sense for statically allocated objects, and static allocation is the enemy of scalable, testable, and re-useable code. Second of all, it encourages the use of specialized threads. IMO, it's better to write in a style where, if something needs to be done, it should not matter which thread does it. – Solomon Slow Jan 22 '23 at 19:48
  • 1
    @SolomonSlow fyi _"...If thread_local is the only storage class specifier applied to a block scope variable, static is also implied...."_ https://en.cppreference.com/w/cpp/language/storage_duration – Richard Critten Jan 22 '23 at 21:54

1 Answers1

3

This is a longstanding, open and known bug in mingw, see the corresponding issue with analyses and links on github: https://github.com/msys2/MINGW-packages/issues/2519

Yes, this violates the standard: it shouldn't crash. Basically the order of destruction is incorrect, as you already suspected. The 0xfeeefeee is the magic number used by HeapFree() to mark the freed memory. See e.g. this post.

To quote lhmouse:

So here comes the rule of thumb: Don't use thread_local on GCC for MinGW targets.

Sedenion
  • 5,421
  • 2
  • 14
  • 42