What does the thread_local mean in C++11?

Question

I am confused with the description of thread_local in C++11. My understanding is, each thread has unique copy of local variables in a function. The global/static variables can be accessed by all the threads (possibly synchronized access using locks). And the thread_local variables are visible to all the threads but can only modified by the thread for which they are defined? Is it correct?

Related question: https://stackoverflow.com/questions/22794382/are-c11-thread-local-variables-automatically-static — Hari, Apr 25 '23 at 19:43

paxdiablo · Accepted Answer · 2022-10-24T22:51:25.783

221

Thread-local storage duration is a term used to refer to data that is seemingly global or static storage duration (from the viewpoint of the functions using it) but, in actual fact, there is one copy per thread.

It adds to the current options:

automatic (exists during a block or function);
static (exists for the program duration); and
dynamic (exists on the heap between allocation and deallocation).

Something that is thread-local is brought into existence at thread creation time and disposed of when the thread finishes.

For example, think of a random number generator where the seed must be maintained on a per-thread basis. Using a thread-local seed means that each thread gets its own random number sequence, independent of all other threads.

If your seed was a local variable within the random function, it would be initialised every time you called it, giving you the same number each time. If it was a global, threads would interfere with each other's sequences.

Another example is something like strtok where the tokenisation state is stored on a thread-specific basis. That way, a single thread can be sure that other threads won't screw up its tokenisation efforts, while still being able to maintain state over multiple calls to strtok - this basically renders strtok_r (the thread-safe version) redundant.

Yet another example would be something like errno. You don't want separate threads modifying errno after one of your calls fails, but before you've had a chance to check the result.

This site has a reasonable description of the different storage duration specifiers.

edited Oct 24 '22 at 22:51

answered Aug 16 '12 at 09:13

paxdiablo

854,327
234
1,573
1,953

6

Using thread local doesn't solve the problems with `strtok`. `strtok` is broken even in a single threaded environment. – James Kanze Aug 16 '12 at 10:15
20

Sorry, let me rephrase that. It doesn't introduce any _new_ problems with strtok :-) – paxdiablo Aug 16 '12 at 10:16
15

Actually, the `r` stands for "re-entrant", which has nothing to do with thread safety. It's true that you can make some things work thread-safely with thread-local storage, but you can't make them re-entrant. – Kerrek SB Aug 16 '12 at 10:18
7

In a single-threaded environment, functions need to be re-entrant only if they are part of a cycle in the call graph. A leaf function (one that doesn't call other functions) is by definition not part of a cycle, and there is no good reason why `strtok` should call other functions. – MSalters Aug 16 '12 at 12:39
6

this would mess it up: `while (something) { char *next = strtok(whatever); someFunction(next); // someFunction calls strtok }` – japreiss Jun 25 '14 at 20:18
1

@MSalters: You get problems if you (try to) intertwine two `strtok` sequences in one thread; say, if you're processing two strings at the same time. That's where the reentrant variants come in handy (plus it's cleaner --- no globals are accessed). – Tim Čas Feb 11 '15 at 17:47
2

Does a ```thread_local object``` calls its deallocator at the end of the thread ? – Dr. Jekyll Jan 27 '17 at 13:33
+1 Great example for `strtok`. I checked glibc from the tip, the implementation of `strtok` is by two lines and calls `strtok_r`. – haxpor Oct 15 '19 at 22:26
Some code samples would be nice – Ayberk Özgür Nov 08 '21 at 13:42

score 184 · Answer 2 · edited Jul 31 '19 at 22:56

When you declare a variable thread_local then each thread has its own copy. When you refer to it by name, then the copy associated with the current thread is used. e.g.

thread_local int i=0;

void f(int newval){
    i=newval;
}

void g(){
    std::cout<<i;
}

void threadfunc(int id){
    f(id);
    ++i;
    g();
}

int main(){
    i=9;
    std::thread t1(threadfunc,1);
    std::thread t2(threadfunc,2);
    std::thread t3(threadfunc,3);

    t1.join();
    t2.join();
    t3.join();
    std::cout<<i<<std::endl;
}

This code will output "2349", "3249", "4239", "4329", "2439" or "3429", but never anything else. Each thread has its own copy of i, which is assigned to, incremented and then printed. The thread running main also has its own copy, which is assigned to at the beginning and then left unchanged. These copies are entirely independent, and each has a different address.

It is only the name that is special in that respect --- if you take the address of a thread_local variable then you just have a normal pointer to a normal object, which you can freely pass between threads. e.g.

thread_local int i=0;

void thread_func(int*p){
    *p=42;
}

int main(){
    i=9;
    std::thread t(thread_func,&i);
    t.join();
    std::cout<<i<<std::endl;
}

Since the address of i is passed to the thread function, then the copy of i belonging to the main thread can be assigned to even though it is thread_local. This program will thus output "42". If you do this, then you need to take care that *p is not accessed after the thread it belongs to has exited, otherwise you get a dangling pointer and undefined behaviour just like any other case where the pointed-to object is destroyed.

thread_local variables are initialized "before first use", so if they are never touched by a given thread then they are not necessarily ever initialized. This is to allow compilers to avoid constructing every thread_local variable in the program for a thread that is entirely self-contained and doesn't touch any of them. e.g.

struct my_class{
    my_class(){
        std::cout<<"hello";
    }
    ~my_class(){
        std::cout<<"goodbye";
    }
};

void f(){
    thread_local my_class unused;
}

void do_nothing(){}

int main(){
    std::thread t1(do_nothing);
    t1.join();
}

In this program there are 2 threads: the main thread and the manually-created thread. Neither thread calls f, so the thread_local object is never used. It is therefore unspecified whether the compiler will construct 0, 1 or 2 instances of my_class, and the output may be "", "hellohellogoodbyegoodbye" or "hellogoodbye".

I think it is important to note that the thread-local copy of the variable is a newly initialized copy of variable. That is, if you add a `g()` call to the beginning of `threadFunc`, then the output will be `0304029` or some other permutation of the pairs `02`, `03`, and `04`. That is, even though 9 is assigned to `i` before the threads are created, the threads get a freshly constructed copy of `i` where `i=0`. If `i` is assigned with `thread_local int i = random_integer()`, then each thread gets a new random integer. — Mark H, Jun 11 '17 at 23:21
Not exactly a permutation of `02`, `03`, `04`, there may be other sequences like `020043` — Hongxu Chen, Sep 17 '18 at 14:04
Interesting tidbit I just found: GCC supports using the address of a thread_local variable as template argument, but other compilers do not (as of this writing; tried clang, vstudio). I'm not sure what the standard has to say about that, or if this is a unspecified area. — jwd, Jul 11 '20 at 17:12

score 31 · Answer 3 · answered Aug 16 '12 at 09:23

Thread-local storage is in every aspect like static (= global) storage, only that each thread has a separate copy of the object. The object's life time starts either at thread start (for global variables) or at first initialization (for block-local statics), and ends when the thread ends (i.e. when join() is called).

Consequently, only variables that could also be declared static may be declared as thread_local, i.e. global variables (more precisely: variables "at namespace scope"), static class members, and block-static variables (in which case static is implied).

As an example, suppose you have a thread pool and want to know how well your work load was being balanced:

thread_local Counter c;

void do_work()
{
    c.increment();
    // ...
}

int main()
{
    std::thread t(do_work);   // your thread-pool would go here
    t.join();
}

This would print thread usage statistics, e.g. with an implementation like this:

struct Counter
{
     unsigned int c = 0;
     void increment() { ++c; }
     ~Counter()
     {
         std::cout << "Thread #" << std::this_thread::id() << " was called "
                   << c << " times" << std::endl;
     }
};

Did you not mean `std::this_thread::get_id()` in your `std::cout`? — Franky, Jul 07 '23 at 09:55

What does the thread_local mean in C++11?

3 Answers3

Linked

Related