10

OK, say that I have

__thread int myVar;

And I then pass &myVar from one thread to another ... If the data is truly "local", then the TLS storage of 1 thread may not be mapped into the other threads address space, and in fact, you could argue that it shouldn't be. This would result in a SIGSEGV or something. However, the system could just map the same address to a different page. Is this what Linux does with .tbss/.tdata? In that case, passing the address of a variable would give you the address of the wrong variable! You'd get your own local copy and not the copy you tried to pass. Or, is everything shared and mapped to different virtual addresses - allowing you to pass around addresses of __thread vars?

Obviously, one should be beaten and flogged for trying to pass thread local storage to another thread by passing its address. There are a million other ways - copying to any other variable for example! But, I was curious if anyone knew ..

  1. The official described behavior in this situation
  2. The current GCC/Linux implementation details

-- Evan

VividD
  • 10,456
  • 6
  • 64
  • 111
Evan Langlois
  • 4,050
  • 2
  • 20
  • 18
  • 1
    Threads *do not have* separate address spaces. They all share the address space of the process. That's one of several reasons why they are more lightweight than processes. Unclear what you're asking. – user207421 Sep 05 '16 at 00:51
  • 1
    Please see the definition of __thread, which is thread LOCAL storage. This is a piece of data that is local to the thread and not shared. – Evan Langlois Sep 06 '16 at 14:47

2 Answers2

14

For x86 at least, TLS is performed using segment registers. The default segment register %ds is implicit in instructions that address memory. When accessing TLS, a thread uses another segment register - %gs for i386 and %fs for x86-64 - which is saved/restored when a thread is scheduled, just as other registers are in a context switch.

So a process-wide variable might be accessed with something like:

mov (ADDR) -> REG ; load memory `myVar` to REG.

which is implicitly:

mov %DS:(ADDR) -> REG

For TLS, the compiler generates:

mov %FS:(ADDR) -> REG ; load thread-local address `myVar` to REG.

In effect, even if the address of the variable appears to be the same in different threads, e.g.,

fprintf(stdout, "%p\n", & myVar); /* in separate threads... */

the fact each thread is using a different value for the segment register, means that they map to different regions of physical memory.

The same scheme is used by Windows (it may interchange the roles of %fs and %gs - not sure), and OS X. As for other architectures, there's an in-depth technical guide to TLS for the ELF ABI. It's missing a discussion of the ARM architecture, and has details on IA-64 and Alpha, so it's showing its age.

Brett Hale
  • 21,653
  • 2
  • 61
  • 90
  • 1
    Wow. I didn't even think about segment registers (grew up on non-x86 machines)! So, Linux normally sets all segment registers to 0 for a flat address space, and leaves them unused, but when TLS gets into the picture, the compiler can set the fs/gs somewhere else and a previously unused register now tracks the location of our local data. Is this correct? I'm surprised the kernel bothered to load/save unused (it was my understanding that they used to be unused) segment registers. – Evan Langlois Jul 17 '14 at 15:37
  • 1
    This answer was quite helpful! One minor quibble, you say "If you were to pass an address from one thread to another, you couldn't access the memory it represents in the first thread". The [GCC docs](https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html) contradict this, saying "An address so obtained may be used by any thread". – AnOccasionalCashew Feb 23 '20 at 03:22
  • @AnOccasionalCashew - I should have clarified that omitting the appropriate segment register (on IA32 / x86-64) would not yield a meaningful thread-local address - my current phrasing is misleading in this context. – Brett Hale Feb 24 '20 at 10:29
2

I had the same question, which brought me here, and so I have tried to verify what Brett and Cashew explained in the previous answer and comments.

Here is an example code to play with:

#include <stdio.h>
#include <pthread.h>
#include <inttypes.h>
#include <unistd.h>
#define N 2

__thread int myVar;
int *commonVar;

void *th(void *arg)
{
        int myid = *((int *)arg);
        myVar = myid;
        printf("thread %d set myVar=%d, &myVar=%p\n", myid, myVar, &myVar);
        sleep(1);
        printf("thread %d now has myVar=%d\n", myid, myVar);
        sleep(1 + myid);
        printf("thread %d sees this value at *commonVar=%d, commonVar=%p\n", myid, *commonVar, commonVar);
        commonVar = &myVar;
        printf("thread %d sets commonVar pointer to his myVar and now *commonVar=%d, commonVar=%p\n", myid, *commonVar, commonVar);
}

int main()
{
        int a = 123;
        pthread_t t[N];
        int arg[N];
        commonVar = &a;

        printf("size of pointer: %lu bits\n", 8UL * sizeof(&a));
        for (int i = 0; i < N; i++)
        {
                arg[i] = i;
                pthread_create(&t[i], 0, th, arg + i);
        }
        for (int i = 0; i < N; i++)
                pthread_join(t[i], 0);
        printf("all done\n");
}

It generates the following output on 32-bit x86 (gcc -m32 -o a a.c -lpthread):

size of pointer: 32 bits
thread 0 set myVar=0, &myVar=0xf7d51b3c
thread 1 set myVar=1, &myVar=0xf7550b3c
thread 0 now has myVar=0
thread 1 now has myVar=1
thread 0 sees this value at *commonVar=123, commonVar=0xffabb390
thread 0 sets commonVar pointer to his myVar and now *commonVar=0, commonVar=0xf7d51b3c
thread 1 sees this value at *commonVar=0, commonVar=0xf7d51b3c
thread 1 sets commonVar pointer to his myVar and now *commonVar=1, commonVar=0xf7550b3c
all done

and on x64 (gcc -o a a.c -lpthread):

size of pointer: 64 bits
thread 0 set myVar=0, &myVar=0x7fe5ae27a6fc
thread 1 set myVar=1, &myVar=0x7fe5ada796fc
thread 0 now has myVar=0
thread 1 now has myVar=1
thread 0 sees this value at *commonVar=123, commonVar=0x7ffff6e3e04c
thread 0 sets commonVar pointer to his myVar and now *commonVar=0, commonVar=0x7fe5ae27a6fc
thread 1 sees this value at *commonVar=0, commonVar=0x7fe5ae27a6fc
thread 1 sets commonVar pointer to his myVar and now *commonVar=1, commonVar=0x7fe5ada796fc
all done

Observation: 1) we can see that thread-local storage (TLS) variables work as expected - every thread has its own copy that does not interfere with others and 2) a pointer to TLS variable can be converted to non-TLS pointer inside of that thread and then used by the same or any other tread to access the value of that particular TLS-local variable of that thread that converted the pointer. Let's look at how this is achieved on the assembly-code level:

First, the assembly code generated for myVar = myid; line (gcc [-m32] -o a.asm a.c -lpthread -Xlinker -Map=output.map -S):

32-bit:

    movl    -12(%ebp), %eax
    movl    %eax, %gs:myVar@ntpoff

64-bit:

    movl    -4(%rbp), %eax
    movl    %eax, %fs:myVar@tpoff

So we can see as Brett mentioned, the GS and FS registers are used to address the TLS variable in a thread, leading to different linear and physical address locations for each thread.

Here is the assembly code generated for the commonVar = &myVar; line:

32-bit:

    movl    commonVar@GOT(%ebx), %eax
    movl    %gs:0, %ecx
    leal    myVar@ntpoff, %edx
    addl    %ecx, %edx
    movl    %edx, (%eax)

64-bit:

    movq    %fs:0, %rax
    addq    $myVar@tpoff, %rax
    movq    %rax, commonVar(%rip)

Thus we can see that a pointer to a TLS variable can be converted to a non-TLS pointer (which will use the default DS segment register) and gcc compiles this by manually doing the segmentation arithmetic with ADD instruction, relying on the fact that with the default DS==0, the obtained linear addresses (gs:myVar vs. ds:commonVar) will be the same, and thus the paging part of the virtual address translation will then be the same for the two cases.

On a final note, it is interesting to see that when we were printing the pointer to myVar (the very first line of the output of each thread), we could see different addresses. That is because when that pointer is passed to the printf() function, it is first converted to DS-based pointer. For example, on 64-bit it looks like this:

    ...
    movq    %fs:0, %rax
    leaq    myVar@tpoff(%rax), %rcx
    ...
    call    printf@PLT
Palo
  • 974
  • 12
  • 27