what does "storage associated with the thread" specific mean.
Mostly it means the thread's call stack. Probably a few megabytes, but they don't make it easy to find out exactly how much space is allocated or, provide any well defined way to change it. See How to set the stacksize with C++11 std::thread
Edit:
My original answer (above) was written in haste, and it failed to answer the main question. What I was thinking about at the time was how the biggest chunk of memory that must be released when cleaning up a thread is the thread's stack.
Why can't call std::thread.join() more than once...?
Because that's how the C++ standard library was designed. In most operating systems, the system call that creates a new thread must be balanced by a system call that destroys it, and the designers of the C++ standard library decided that, for any thread
that is not detached, then that second system call should happen within the one and only join()
call.
There generally are two parts to the resources that must be released when a thread ends; the "user space" part (i.e., what the application controls,) and the "kernel space" part
The user space part consists mostly of the thread's call stack. It's typically going to be a contiguous chunk of several megabytes in the process's virtual address space. How it gets allocated depends on the operating system. In Windows, the thread stack is allocated by the CreateThread(...)
WinAPI call, and it gets freed when the thread itself calls ExitThread(...)
—always the last thing that any Windows thread does before it terminates. In Linux, the application is responsible for allocating the stack before it creates the new thread and, to free the stack after is no longer needed.
The user-space part consists mostly of the thread's call stack, but there's also the std::thread
object itself. How and when the thread object gets allocated and freed works exactly the same way as for any other C++ object is allocated and freed.
The kernel-space part is relatively small: It's a record somewhere within the kernel's memory that holds the thread's state (e.g., "running," "runnable," "waiting for...," "dead,") and it holds the thread's context. The context is a snapshot of all of the CPU's registers that gets taken every time the thread is preempted, and then loaded back into the CPU registers when it's time to let the thread run again.