2

The gcc 4.8.0 added the implementation of thread_local from the C++11 Standard. The Changes state that there may be a "runtime penally":

G++ now implements the C++11 thread_local keyword; [...] Unfortunately, this support requires a run-time penalty for references to non-function-local thread_local variables defined in a different translation unit even if they don't need dynamic initialization, [...].

If the programmer can be sure that no use of the variable in a non-defining TU needs to trigger dynamic initialization (either because the variable is statically initialized, or a use of the variable in the defining TU will be executed before any uses in another TU), they can avoid this overhead with the -fno-extern-tls-init option.

Can anyone explain to me what G++ does for thread_local global variables?

  • What is the general mechanism?
  • What induces the overhead?
  • How much overhead is involved per access? A pointer indirection? A costly lock?
  • Under what circumstances is there no overhead, exactly?

From the changes note I assume for example that this would not have overhead:

thread_local Data data { 1000 };

void worker() {
    for(auto &elem : data)
        elem.calulcate();
}

because data is in the same translation unit?

And how does this change if worker and data are in different translation units? Is this an example for that?

// module.cpp

void worker();

thread_local Data data { 1000 };

void start() {
    worker();
}

// main.cpp

extern thread_local Data data; // correct decl?

void worker() {
    for(auto &elem : data)
        elem.calulcate();
}

Does now the use of data in worker induce an overhead? Is that still the case, even it it was start that kicked off worker?

Community
  • 1
  • 1
towi
  • 21,587
  • 28
  • 106
  • 187

0 Answers0