static thread_local
and thread_local
at block scope are equivalent; thread_local
has a thread storage duration, not static or automatic; therefore, static and automatic specifiers i.e. thread_local
, which is auto thread_local
, and static thread_local
have no effect on the storage duration; semantically, they are nonsense to use and they're just implicitly taken to mean thread storage duration, due to the presence of thread_local
; static doesn't even modify the linkage at block scope either (because it's always no linkage), so it has no other definition other than modifying storage duration. extern thread_local
is also possible in block scope. static thread_local
at file scope gives the thread_local
variable internal linkage, which means there will be one copy per translation unit in the TLS (each translation unit will resolve to its own variable at the TLS index for the .exe
, because the assembler will insert the variable in the rdata$t
section of the .o
file and mark it in the symbol table as a local symbol due to the lack of the .global
directive on the symbol). extern thread_local
at file scope is legal like it is at block scope and uses the thread_local
copy defined in another translation unit. thread_local
at file scope is not implicitly static, because it can provide a global symbol definition for another translation unit, which cannot be done by a block-scope variable.
The compiler will store all initialised thread_local
variables in .tdata
(including block-scope ones) for ELF and uninitialised ones in .tbss
for ELF, or all in .tls
for PE format. I presume the thread library, when creating a thread, will access the .tls
segment and perform windows API calls (TlsAlloc
and TlsSetValue
), which allocate the variables for each .exe
and .dll
on the heap and places a pointers in the TLS array of the thread's TEB in the GS segment and returns the index allocated, as well as call DLL_THREAD_ATTACH
routines for dynamic libraries. Presumably, a pointer to a value in the space defined by _tls_start
and _tls_end
is what's passed to TlsSetValue
as the value pointer.
The difference between file scope static/extern thread_local
and block scope (extern) thread_local
is the same general difference between file scope static/extern
and block scope static/extern
, in that the block scope thread_local
variable will go out of scope at the end of the function it is defined in, although it can still be returned and accessed by address because of the thread storage duration.
The compiler knows the index of the data in the .tls segment, so it can substitute accesses the GS segment directly, as can be seen on godbolt.
MSVC
thread_local int a = 5;
int square(int num) {
thread_local int i = 5;
return a * i;
}
_TLS SEGMENT
int a DD 05H ; a
_TLS ENDS
_TLS SEGMENT
int `int square(int)'::`2'::i DD 05H ; `square'::`2'::i
_TLS ENDS
num$ = 8
int square(int) PROC ; square
mov DWORD PTR [rsp+8], ecx
mov eax, OFFSET FLAT:int a ; a
mov eax, eax
mov ecx, DWORD PTR _tls_index
mov rdx, QWORD PTR gs:88
mov rcx, QWORD PTR [rdx+rcx*8]
mov edx, OFFSET FLAT:int `int square(int)'::`2'::i
mov edx, edx
mov r8d, DWORD PTR _tls_index
mov r9, QWORD PTR gs:88
mov r8, QWORD PTR [r9+r8*8]
mov eax, DWORD PTR [rcx+rax]
imul eax, DWORD PTR [r8+rdx]
ret 0
int square(int) ENDP ; square
This loads a 64 bit pointer from gs:88
(gs:[0x58]
, which is the linear address of the thread-local storage array), then loads a 64 bit pointer using the TLS array pointer + _tls_index*8
(this is obviously locating the index in the array * pointer size). Int a;
is then loaded from this pointer + offset into the .tls segment. Seeing as both variables use the same _tls_index
, it suggests that there is an index per .exe, i.e. per .tls section, indeed there is one _tls_index
per TLS directory in .rdata
, and the variables are packed together at the address pointed to by the TLS array. static thread_local
variables in different translation units will be merged into .tls and all be packed together at the same index.
I believe that mainCRTStartup
, which the linker always includes in the final executable and makes it the entry point if it is being linked as a console application, references the _tls_used
variable (because every .exe needs its own index) and it was pragma'd to go in the T fragment of .rdata
in whatever object file within libcmt.lib
defines it (and because mainCRTStartup
references it the linker will include it in the final executable). If the linker finds a reference to a _tls_used
variable, it will make sure to include it and make sure the PE header TLS directory points to it.
#pragma section(".rdata$T", long, read) //creates a read only section called `.rdata` if not created and a fragment T in the section
#define _CRTALLOC(x) __declspec(allocate(x))
#pragma data_seg() //set the compilers current default data section to `.data`
_CRTALLOC(".rdata$T") //place in the section .rdata, fragment T
const IMAGE_TLS_DIRECTORY _tls_used =
{
(ULONG)(ULONG_PTR) &_tls_start, // start of tls data in the tls section
(ULONG)(ULONG_PTR) &_tls_end, // end of tls data
(ULONG)(ULONG_PTR) &_tls_index, // address of tls_index
(ULONG)(ULONG_PTR) (&__xl_a+1), // pointer to callbacks
(ULONG) 0, // size of tls zero fill
(ULONG) 0 // characteristics
};
http://www.nynaeve.net/?p=183
_tls_used
is a variable of type IMAGE_TLS_DIRECTORY
structure, with the above initialised content, and it's actually defined in tlssup.c
. Prior to this, it defines _tls_index
, _tls_start
and _tls_end
, placing _tls_start
at the start of the .tls
section and _tls_end
at the end of the .tls
section by placing it in the section fragmentZZZ
such that it alphabetically ends up at the end of the section:
#pragma data_seg(".tls") //set the compilers current default data section to `.tls`
#if defined (_M_IA64) || defined (_M_AMD64)
_CRTALLOC(".tls") //place the following in the section named `.tls`
#endif
char _tls_start = 0; //if not defined, place in the current default data section, which is also `.tls`
#pragma data_seg(".tls$ZZZ")
#if defined (_M_IA64) || defined (_M_AMD64)
_CRTALLOC(".tls$ZZZ")
#endif
char _tls_end = 0;
The addresses of these are then used as markers in the _tls_used
TLS directory. The address will only be resolved by the linker when the .tls
section is complete and it has a fixed relative lea
location.
GCC (TLS is directly before FS base; raw data rather than pointers)
mov edx,DWORD PTR fs:0xfffffffffffffff8 //access thread_local int1 inside function
mov eax,DWORD PTR fs:0xfffffffffffffffc //access thread_local int2 inside function
Making one, both or none of the variables local produces identical code.
When the thread execution terminates, the thread library on windows will deallocate the storage using TlsFree()
calls (it also must deallocate the memory on the heap pointed to the pointer returned by TlsGetValue()
).