Why can't call std::thread.join() more than once for a given thread?

Question

I'm new to C++ multi-threaded programming, and I encountered some difficulties about the join() function while reading a book:

The act of calling join() also cleans up any storage associated with the thread, so the std::thread object is no longer associated with the now-finished thread; it isn’t associated with any thread. This means that you can call join() only once for a given thread; once you’ve called join(), the std::thread object is no longer joinable, and joinable() will return false.

What does "storage associated with the thread" specifically mean, and why is it cleaning up the storage associated with the thread when calling join()? Can anyone explain the principles behind this?

By definition, `join` joins to an active, or a terminated thread. Once done, the thread is dead. It is no more. It ceased to exist. It is an ex-thread. It's pining for the fjords. It will never be alive again, to be joined to. Cats have nine lives, but threads live just once. — Sam Varshavchik, Sep 13 '22 at 16:20
@SamVarshavchik, Re, "Once done, the thread is dead." That's a peculiarity of C++. In many other languages, it works the other way 'round. Once the thread is dead, `join()` becomes _possible._ In Java's standard library, for example, `t.join()` is a function that waits until thread `t` has terminated. It does nothing else. In particular, it does nothing at all _to_ thread `t`. All it does is wait, and it can be called any number of times on the same thread `t` by any number of other threads. — Solomon Slow, Sep 13 '22 at 20:18
@SamVarshavchik I don't know if I get it right. join() wait until the thread finishes its execution, cleans up its associated storage, and only then it can finish its own execution. Once that is done, the thread doesn't exist so can not call join() to it again. — SICSU, Sep 14 '22 at 02:37
"... and only then it continues executing". Everything else is correct. — Sam Varshavchik, Sep 14 '22 at 11:12

Solomon Slow · Answer 1 · 2022-09-14T17:10:26.047

what does "storage associated with the thread" specific mean.

Mostly it means the thread's call stack. Probably a few megabytes, but they don't make it easy to find out exactly how much space is allocated or, provide any well defined way to change it. See How to set the stacksize with C++11 std::thread

Edit:

My original answer (above) was written in haste, and it failed to answer the main question. What I was thinking about at the time was how the biggest chunk of memory that must be released when cleaning up a thread is the thread's stack.

Why can't call std::thread.join() more than once...?

Because that's how the C++ standard library was designed. In most operating systems, the system call that creates a new thread must be balanced by a system call that destroys it, and the designers of the C++ standard library decided that, for any thread that is not detached, then that second system call should happen within the one and only join() call.

There generally are two parts to the resources that must be released when a thread ends; the "user space" part (i.e., what the application controls,) and the "kernel space" part

The user space part consists mostly of the thread's call stack. It's typically going to be a contiguous chunk of several megabytes in the process's virtual address space. How it gets allocated depends on the operating system. In Windows, the thread stack is allocated by the CreateThread(...) WinAPI call, and it gets freed when the thread itself calls ExitThread(...)—always the last thing that any Windows thread does before it terminates. In Linux, the application is responsible for allocating the stack before it creates the new thread and, to free the stack after is no longer needed.

The user-space part consists mostly of the thread's call stack, but there's also the std::thread object itself. How and when the thread object gets allocated and freed works exactly the same way as for any other C++ object is allocated and freed.

The kernel-space part is relatively small: It's a record somewhere within the kernel's memory that holds the thread's state (e.g., "running," "runnable," "waiting for...," "dead,") and it holds the thread's context. The context is a snapshot of all of the CPU's registers that gets taken every time the thread is preempted, and then loaded back into the CPU registers when it's time to let the thread run again.

TonySalimi · Answer 2 · 2022-09-14T20:18:17.343

What does "storage associated with the thread" specifically mean, and why is it cleaning up the storage associated with the thread when calling join()? Can anyone explain the principles behind this?

When a thread is terminated, the OS keeps its return state inside an entry of a specific data structure of the kernel. If another thread calls join() on the terminated joinable thread, that entry will be removed. But if you do not call join() for a joinable thread, a zombie thread is created. In other words, a zombie thread, is a kind of thread that has already been terminated, but an entry about its status still exist in the kernel.

Considering the fact that each zombie thread consumes some system resources, if many zombie threads are accumulated, it will no longer be possible to create new threads.

273K · Answer 3 · 2022-09-13T16:43:08.383

The thread storage does nothing to do with the thread stack as in the other answer. When the thread terminates/finishes, all thread resources are freed except one in the OS kernel structures, that keeps thread exit status and possible other data used by OS. Joining just removes that data, OS forgets that thread for ever, further access to the thread is undefined behavior, for example can lead to unpredictable joining to a newly created thread.

Why can't call std::thread.join() more than once for a given thread?

3 Answers3