The gcc 4.8.0 added the implementation of thread_local
from the C++11 Standard.
The Changes state that there may be a "runtime penally":
G++ now implements the C++11
thread_local
keyword; [...] Unfortunately, this support requires a run-time penalty for references to non-function-localthread_local
variables defined in a different translation unit even if they don't need dynamic initialization, [...].If the programmer can be sure that no use of the variable in a non-defining TU needs to trigger dynamic initialization (either because the variable is statically initialized, or a use of the variable in the defining TU will be executed before any uses in another TU), they can avoid this overhead with the -fno-extern-tls-init option.
Can anyone explain to me what G++ does for thread_local
global variables?
- What is the general mechanism?
- What induces the overhead?
- How much overhead is involved per access? A pointer indirection? A costly lock?
- Under what circumstances is there no overhead, exactly?
From the changes note I assume for example that this would not have overhead:
thread_local Data data { 1000 };
void worker() {
for(auto &elem : data)
elem.calulcate();
}
because data
is in the same translation unit?
And how does this change if worker
and data
are in different translation units? Is this an example for that?
// module.cpp
void worker();
thread_local Data data { 1000 };
void start() {
worker();
}
// main.cpp
extern thread_local Data data; // correct decl?
void worker() {
for(auto &elem : data)
elem.calulcate();
}
Does now the use of data
in worker
induce an overhead? Is that still the case, even it it was start
that kicked off worker
?