0

I have read another discussion.

I know %gs is segment register, it stores segment descriptor. OS get segment descriptor and calculate physical address. Most of time, segment descriptor is intransparent for programmer. I can do some trick like intercepting systcall set_thread_area and get value of %gs.

But most of thing what they say are still too abstract to me. So I try to build a simple code to express my question. I hope someone can tell me what wrong I make in my example.

First at all, I write a pthread code as following.

__thread int Sum = 123; // declare as __thread type. 123 = 0x7b

void *show_msg( void *ptr ) {
 for( int x = 5 ; x > 0 ; --x){
    printf("%d\n", Sum++ ); // print the value of Sum and plus 1
    sleep(1);
 }
 pthread_exit((void *)1234);
}

int main(){
   pthread_t thread1;
   pthread_t thread2;
   char *message1 = "Thread 1";
   char *message2 = "Thread 2";

   pthread_create(&thread1, NULL , show_msg , (void*) message1);
   pthread_create(&thread2, NULL , show_msg , (void*) message2);
   pthread_join( thread1, &ret);
   pthread_join( thread2, &ret);

   return 0;
}

I compile it with gcc test.cpp -lpthread -static -m32

Then I do objdump -D a.out. I only post part of result which I can't understand. Because a.out is a static linked binary, I can get some initialization code like <__libc_setup_tls>

08052510 <__libc_setup_tls>:
  ...
805262c: mov $0xf3,%eax ; syscall number 0xf3 is set_thread_area
8052631: mov %ebx,0x24(%esp)
805262c: lea 0x20(%esp),%ebx ; %ebx stores a pointer to struct user_desc
  ...
8052651: int $0x80

  ...

080496d4<_Z8show_msgPv>:
  ...
80496f0: mov %gs:0xffffffd0,%eax
80496f6: lea 0x1(%eax),%edx
80496f9: mov %edx,%gs:0xffffffd0
  ...

I run gdb with a.out and I set break point at 0x805262c and 0x80496f0.

805262c: lea 0x20(%esp),%ebx ; %ebx stores a pointer to struct user_desc

After I perform this instruction, the value of %ebx is 0xffffccd0. I know that the value 0xffffccd0 is a pointer of user_desc, and the memory 0xffffccd4 stores a the value of %gs, which is 0x080fd840.

Then I continue my debugging.

80496f0: mov %gs:0xffffffd0,%eax

I know the value of %gs 0x63, which is segment descriptor number and points to 0x080fd840. So I can calculate the value of %gs:0xffffffd0 is 0x080fd810. The memory of 0x080fd810 stores 0x7b. I am exciting when I get this value because 0x7b is heximal value of 123, which is the initial value of global variable Sum.

But something is strange when I do the following instructions.

80496f6: lea 0x1(%eax),%edx ; yield %edx = 0x7c
80496f9: mov %edx,%gs:0xffffffd0 ; store 0x7c to %gs:0xffffffd0(????)

The result of addition doesn't store to 0x080fd810, the memory address of %gs:0xffffffd0. But the next iteration of this thread can get 0x7c from %gs:0xffffffd0!!!

I trace system call by using strace -c ./a.out. It shows that the number of calling set_thread_area is only 1. That is, %gs is set only one time.

I think that OS do some change when thread context switch occurred. Can anyone give me more detail and tell me why my idea is wrong in this case?

Community
  • 1
  • 1
hwliu
  • 205
  • 2
  • 6

1 Answers1

0

The OS will handle the memory for thread-local storage (TLS), and maintain both that %gs [or it's base address] is updated when the next thread is loaded, and that memory is allocated [1] when a new thread is created.

The compiler & linker is responsible for calculating the size and respective offsets into the TLS - in this case, it would seem that the implementation uses negative offsets from the base-address, so your particular variable is at -0x30 from %gs.

[When you say "I know %gs is 0x080fd840, you mean that the base-address for the segment is that value, right? Since %gs would be a 16-bit index into a x86 descriptor-table]

[1] This may mean that the OS just makes a virtual address available for the TLS, but that the actual allocation of PHYSICAL memory happens "as needed", in the same way that an executable file, shared library or large memory allocation is done.

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • Oops! I made a mistake in my description ;). The value of %gs is 0x63. I believe this number is segment descriptor number and this descriptor points to 0x080fd840. Notice that the number of %gs never be changed in my case, it is always 0x63.(even thread context switch occurred). By your answer, [OS maintain both that %gs is updated when the next thread is loaded], I think that OS modified the `content` of descriptor (0x63) when new thread is loaded. So `0x080fd810` does not equal to %gs:0xffffffd0. So far so good? – hwliu Nov 14 '15 at 13:28
  • Yes, what the OS actually does is update the `%gs` base address. – Mats Petersson Nov 14 '15 at 13:38
  • Actually, you could add some code to print the address of `Sum` in your code, and you'd notice that it's at different address in each thread. – Mats Petersson Nov 14 '15 at 13:39