Confused about the use of C++ (static) thread_local declared inside a function passed to (j)thread

Question

I started to see a few C++ related posts on Stackoverflow in which people suggest to use thread_local within the function that is passed to (j)thread. For example:

How do I generate thread-safe uniform random numbers?

Say we have something like this:

#include <thread>
#include <random>
#include <mutex>

void thread_function()
{
    static thread_local std::default_random_engine gen;
    std::uniform_real_distribution<float> dist(0.0f, 1.f);
    unsigned int a{ 1 };

    float b = a * dist(gen);

    {
        std::lock_guard<std::mutex> lock(cout_mutex);
        std::cout << "b: " << b << '\n';
    }
}

int main()
{
    std::jthread A(thread_function);
    std::jthread B(thread_function);
    A.join();
    B.join();

    return 0;
}

Isn't the random engine and the variable a both stored on the thread's stack? My understanding was that thread_local should be use like so:

I took this example from https://en.cppreference.com/w/cpp/language/storage_duration

#include <iostream>
#include <string>
#include <thread>
#include <mutex>

thread_local unsigned int rage = 1; 
std::mutex cout_mutex;

void increase_rage(const std::string& thread_name)
{
    ++rage; // modifying outside a lock is okay; this is a thread-local variable
    std::lock_guard<std::mutex> lock(cout_mutex);
    std::cout << "Rage counter for " << thread_name << ": " << rage << '\n';
}

int main()
{
    std::thread a(increase_rage, "a"), b(increase_rage, "b");
 
    {
        std::lock_guard<std::mutex> lock(cout_mutex);
        std::cout << "Rage counter for main: " << rage << '\n';
    }
 
    a.join();
    b.join();
}

Possible output:

Rage counter for a: 2
Rage counter for main: 1
Rage counter for b: 2

In this particular case, it makes sense to me, since the variable rage is declared at the global scope but because it's declared as thread_local, each thread owns a similar variable that threads can edit independently from each other.

But then shouldn't this be equivalent to?

#include <iostream>
#include <string>
#include <thread>
#include <mutex>

std::mutex cout_mutex;

void increase_rage(const std::string& thread_name)
{
    unsigned int rage = 1; 
    ++rage; // modifying outside a lock is okay; this is a thread-local variable
    std::lock_guard<std::mutex> lock(cout_mutex);
    std::cout << "Rage counter for " << thread_name << ": " << rage << '\n';
}

int main()
{
    std::thread a(increase_rage, "a"), b(increase_rage, "b");
 
    {
        std::lock_guard<std::mutex> lock(cout_mutex);
        //std::cout << "Rage counter for main: " << rage << '\n';
    }
 
    a.join();
    b.join();
}

Of course in this example rage isn't available in the main function any longer, yet this raises 2 questions:

Does it make sense to declare a variable thread_local in the thread function? Or is it intended to be used like in the cppreference example - ... only (without making the first example however illegal yet useless)?
If it makes sense to have it used in the thread function as well (say in thread_function) what's the difference between the variable that's declared as thread_local and the variable a that is not. To me both are stored on the thread's stack (and are "local to the thread")?

Many thanks for your kind explanation.

Edit / Examples / Solution

For future readers, I thought it would be great to add examples that practically show the difference between a variable set as thread_local and one that's not. Thanks for all the contributions. They all helped putting the pieces of the puzzle together (there's -- surprisingly -- very little examples about this topic on the internet at this date). Note: my question was more about what's the difference between a variable (declared inside a function called by a thread) tagged thread_local compared to a variable that's not (rather than about duration), with if possible, concrete examples showing the difference.

Example 1: recursion

I didn't understand at first the use of thread_local within the scope of the function run by the thread initially, until @RaymondChen and @SolomonSlow mentioned the idea of recursion. I didn't think about recursion so unless someone mentions this to you, that's not necessarily obvious, but indeed one may need to call the thread function from within the thread function, etc. In each case declaring a variable thread_local within the scope of the thread function makes sense (example below). The state of the variable remains "global" (within the context of the thread) to the successive recursive call to the thread function (you can see in the outcome that a gets incremented while b keeps its initial value (state of b is "initialized" each time the thread function is called while a gets incremented).

#include <thread>
#include <mutex>
#include <iostream>

void thread_func()
{
    thread_local int a { 0 };
    int b{ 0 };

    {
        static std::mutex m;
        std::lock_guard<std::mutex> lock{ m };
        std::cout << "a: " << a++ << " b: " << b++ << std::endl;
    }

    if (a <= 2)
        thread_func();

}

int main()
{
    std::jthread a(thread_func);
    a.join();

    return 0;
}

Outcome:

a: 0 b: 0
a: 1 b: 0
a: 2 b: 0

Example 2: thread_local variable declared globally

I would expect that this is a more "typical" use of thead_local (at least this is what the cppreference example shows/uses) where a variable is declared thread_local at the global scope (the c variable in this example) of the program, so it can be called and used by the main function, yet, each thread has its own copy of the variable and maintains its state, independently from other threads (including the main one). And this state in maintain throughout the function that the thread function may eventually call (in this example thread_func calls another_func).

#include <thread>
#include <mutex>
#include <iostream>

thread_local int c{ 0 };

void another_func()
{
    c++;
}

void thread_func(int id)
{
    thread_local int a { 0 };
    int b{ 0 };

    {
        static std::mutex m;
        std::lock_guard<std::mutex> lock{ m };
        std::cout << "id: " << id << " Results -> a: " << a++ << " b: " << b++ << " c: " << c << std::endl;
    }
    another_func();

    if (a <= 2)
        thread_func(id);
}

int main()
{
    std::jthread a(thread_func, 1);
    std::jthread b(thread_func, 2);

    a.join();
    b.join();

    std::cout << "Goodbye: " << c << std::endl;

    return 0;
}

Outcome

id: 2 Results -> a: 0 b: 0 c: 0
id: 2 Results -> a: 1 b: 0 c: 1
id: 2 Results -> a: 2 b: 0 c: 2
id: 1 Results -> a: 0 b: 0 c: 0
id: 1 Results -> a: 1 b: 0 c: 1
id: 1 Results -> a: 2 b: 0 c: 2
Goodbye: 0

If `thread_function` is called more than once by a thread (e.g., due to recursion), then `gen` is shared by all the calls, but each call gets its own `a`. In the question you linked, the function is called many times by a single thread, so the `gen` gets reused. — Raymond Chen, Jul 11 '22 at 14:41

score 3 · Answer 1 · answered Jul 11 '22 at 15:03

This is a question of storage duration.

1. automatic storage duration:

void increase_rage(const std::string& thread_name)
{
    unsigned int rage = 1;

Here rage is created and destroyed within the scope of each function invocation. Each time the function is invoked, a new instance is created and initialized to 1. So it won't work for the purpose of counting invocations.

2. static storage duration:

void increase_rage(const std::string& thread_name)
{
    static unsigned int rage = 1;  // or at global scope

Here the variable is allocated once in the program's data segment (not on stack). In this case the value will persist between invocations. It will have only one value shared by all threads and will require synchronization to access from multiple threads (that can be solved using a mutex or std::atomic<int>).

3. thread storage duration:

void increase_rage(const std::string& thread_name)
{
    static thread_local unsigned int rage = 1;

Here the variable is allocated once for each thread (in TLS storage) and deallocated when that thread ends. It can count function invocations per thread and does not require synchronization.

Note that static is implied when thread_local is used at block scope, so we can omit it here.

Thanks I think all your answers are good and complementary. but again I think the key was provided by @RaymondChen in the original question and I was missing that bit. TLS storage means I guess what he suggested which is that then `gen` will be in whatever state it was if `thread_func` call itself in a recursive manner whereas `a` wouldn't. — user18490, Jul 11 '22 at 15:13

Solomon Slow · Answer 2 · 2022-07-11T15:56:42.413

2

void thread_function()
{
    static thread_local std::default_random_engine gen;
    unsigned int a{ 1 };
    ...
}
Isn't the random engine and the variable a both stored on the thread's stack?

The gen variable not stored on any stack. It's static. A static local variable can only be accessed from within the block where it is declared, but other than that, it behaves exactly like a global variable. It gets initialized one time before its first use, and then after that, it continues to exist for the lifetime of the program. Upon coming back into the block for the Nth time, it will have whatever value it had when some thread left the block for the (N-1)th time.

The gen variable also is, thread_local, which means that a different version of it exists for each different thread that enters the block.

The a variable is not static, and so it gets re-initialized every time any thread enters the block, and it is destroyed when the thread leaves the block.

edited Jul 11 '22 at 15:56

answered Jul 11 '22 at 14:42

Solomon Slow

25,130
5
37
57

thx. I edited the code so the function `thread_func` does something with `gen` and `a`. How could you come back to the block once a thread has left it? I mean when the thread is done with work the thread should be destroyed so would the variables it held? Wouldn't there be also a different version of `a` for each thread? – user18490 Jul 11 '22 at 15:03
I guess I will accept the answer, but I would recommend future readers to look at @RaymonChen comment in the original question. Which provides a complementary insight to this answer (use of recursion). Which means if I get it right than unless recursion or something similar isn't necessary, using `thread_local` in this example is probably not important. – user18490 Jul 11 '22 at 15:08
@user18490 Re, "How could you come back to the block once a thread has left it?" In this case, "the block" is the body of `thread_function()`. A thread could come back to the block by calling `thread_function()` more than one time. That does not happen in _your_ program, but it's a thing that could happen in other programs. – Solomon Slow Jul 11 '22 at 15:59
@user18490, Re, "when the thread is done with work the thread should be destroyed so would the variables it held?" Only the _automatic_ variables are destroyed. `a` is automatic, but `gen` is not. The lifetime of `gen` is the same as the lifetime of the process. That's what `static` means. Also: The reason why automatic variable `a` gets destroyed isn't because the thread ended. It gets destroyed because `thread_function()` returned. Those are different events. In a different program that called `thread_function()` more than one time, `a` would be created and destroyed more than one time. – Solomon Slow Jul 11 '22 at 16:07
1

@user18490, The difference between RaymondChen's answer and mine is the focus. RaymondChen's answer focuses more on your specific problem. I was trying to focus more on what `static` and `thread_local` mean in general, and not just on how those keywords affect your example. – Solomon Slow Jul 11 '22 at 16:22

Confused about the use of C++ (static) thread_local declared inside a function passed to (j)thread

Edit / Examples / Solution

2 Answers2