-1

Assembly included. This weekend I tried to get my own small library running without any C libs and the thread local stuff is giving me problems. Below you can see I created a struct called Try1 (because it's my first attempt!) If I set the thread local variable and use it, the code seems to execute fine. If I call a const method on Try1 with a global variable it seems to run fine. Now if I do both, it's not fine. It segfaults despite me being able to access members and running the function with a global variable. The code will print Hello and Hello2 but not Hello3

I suspect the problem is the address of the variable. I tried using an if statement to print the first hello. if ((s64)&t1 > (s64)buf+1024*16) It was true so it means the pointer isn't where I thought it was. Also it isn't -8 as gdb suggest (it's a signed compare and I tried 0 instead of buf)

Assembly under the c++ code. First line is the first call to write

//test.cpp
//clang++ or g++ -std=c++20 -g -fno-rtti -fno-exceptions -fno-stack-protector -fno-asynchronous-unwind-tables -static -nostdlib test.cpp -march=native && ./a.out
#include <immintrin.h>
typedef unsigned long long int u64;

ssize_t my_write(int fd, const void *buf, size_t size) {
    register int64_t rax __asm__ ("rax") = 1;
    register int rdi __asm__ ("rdi") = fd;
    register const void *rsi __asm__ ("rsi") = buf;
    register size_t rdx __asm__ ("rdx") = size;
    __asm__ __volatile__ (
        "syscall"
        : "+r" (rax)
        : "r" (rdi), "r" (rsi), "r" (rdx)
        : "cc", "rcx", "r11", "memory"
    );
    return rax;
}

void my_exit(int exit_status) {
    register int64_t rax __asm__ ("rax") = 60;
    register int rdi __asm__ ("rdi") = exit_status;
    __asm__ __volatile__ (
        "syscall"
        : "+r" (rax)
        : "r" (rdi)
        : "cc", "rcx", "r11", "memory"
    );
}

struct Try1
{
    u64 val;
    constexpr Try1() { val=0; }
    u64 Get() const { return val; }
};

static char buf[1024*8]; //originally mmap but lets reduce code

static __thread u64 sanity_check;
static __thread Try1 t1;
static Try1 global;

extern "C"
int _start()
{
    auto tls_size = 4096*2;
    auto originalFS = _readfsbase_u64();
    _writefsbase_u64((u64)(buf+4096));

    global.val = 1;
    global.Get(); //Executes fine

    sanity_check=6;
    t1.val = 7;

    my_write(1, "Hello\n", sanity_check);
    my_write(1, "Hello2\n", t1.val); //Still fine
    my_write(1, "Hello3\n", t1.Get()); //crash! :/
    my_exit(0);
    return 0;
}

Asm:

4010b4:       e8 47 ff ff ff          call   401000 <_Z8my_writeiPKvm>
4010b9:       64 48 8b 04 25 f8 ff    mov    rax,QWORD PTR fs:0xfffffffffffffff8
4010c0:       ff ff 
4010c2:       48 89 c2                mov    rdx,rax
4010c5:       48 8d 05 3b 0f 00 00    lea    rax,[rip+0xf3b]        # 402007 <_ZNK4Try13GetEv+0xeef>
4010cc:       48 89 c6                mov    rsi,rax
4010cf:       bf 01 00 00 00          mov    edi,0x1
4010d4:       e8 27 ff ff ff          call   401000 <_Z8my_writeiPKvm>
4010d9:       64 48 8b 04 25 00 00    mov    rax,QWORD PTR fs:0x0
4010e0:       00 00 
4010e2:       48 05 f8 ff ff ff       add    rax,0xfffffffffffffff8
4010e8:       48 89 c7                mov    rdi,rax
4010eb:       e8 28 00 00 00          call   401118 <_ZNK4Try13GetEv>
4010f0:       48 89 c2                mov    rdx,rax
4010f3:       48 8d 05 15 0f 00 00    lea    rax,[rip+0xf15]        # 40200f <_ZNK4Try13GetEv+0xef7>
4010fa:       48 89 c6                mov    rsi,rax
4010fd:       bf 01 00 00 00          mov    edi,0x1
401102:       e8 f9 fe ff ff          call   401000 <_Z8my_writeiPKvm>
401107:       bf 00 00 00 00          mov    edi,0x0
40110c:       e8 12 ff ff ff          call   401023 <_Z7my_exiti>
401111:       b8 00 00 00 00          mov    eax,0x0
401116:       c9                      leave  
401117:       c3                      ret    
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Eric Stotch
  • 141
  • 4
  • 19
  • `register` is an unused keyword since C++17. – François Andrieux Oct 18 '21 at 20:04
  • @FrançoisAndrieux The assembly appears to not work without it – Eric Stotch Oct 18 '21 at 20:06
  • Feels like you would be better off just writing assembly directly. – François Andrieux Oct 18 '21 at 20:06
  • @FrançoisAndrieux I don't think making people copy/paste two separate files is worth the bother. It doesn't appear to affect my problem – Eric Stotch Oct 18 '21 at 20:07
  • @FrançoisAndrieux: `register __asm__` is a gcc/clang extension, used in conjunction with inline assembly. See https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables – Nate Eldredge Oct 18 '21 at 20:34
  • You don't need any of those flags or assembly to reproduce the segfault: https://gcc.godbolt.org/z/Yaaz5Yc1T `_writefsbase_u64((u64)(buf+4096));` is the cause. I don't think you can just change the file segment register like that in userland. (commenting it out also fixes your code: https://gcc.godbolt.org/z/ojo49Y577) –  Oct 18 '21 at 20:37
  • @NateEldredge I didn't know that, thanks. It seems weird to me that a compiler would use a reserved keyword for an extension. – François Andrieux Oct 18 '21 at 20:40
  • @Frank about fsbase, thats what I was thinking but I can't seem to find documentation online. I definitely think its because I'm using incorrect keywords. I'm specifically trying to do this without a standard c library so I definitely need -nostdlib which causes the segfault to happen even on godbolt. I think I have to use arch_prctl but yesterday someone suggested I dont need arch_prctl and fsbase so if its true I'm doing something wrong – Eric Stotch Oct 18 '21 at 20:50
  • By the way, trying to write C++ with no support from the standard runtime is going to be full of "interesting" surprises. For instance, your constructor `Try1::Try1()` isn't being called on any of your `static` or `__thread` objects. The standard startup code would call static constructors before calling `main`, but since your program is its own startup code, it won't happen unless you do it yourself. – Nate Eldredge Oct 18 '21 at 21:26

1 Answers1

2

The ABI requires that fs:0 contains a pointer with the absolute address of the thread-local storage block, i.e. the value of fsbase. The compiler needs access to this address to evaluate expressions like &t1, which here it needs in order to compute the this pointer to be passed to Try1::Get().

It's tricky to recover this address on x86-64, since the TLS base address isn't in a convenient general register, but in the hidden fsbase. It isn't feasible to execute rdfsbase every time we need it (expensive instruction that may not be available) nor worse yet to call arch_prctl, so the easiest solution is to ensure that it's available in memory at a known address. See this past answer and sections 3.4.2 and 3.4.6 of "ELF Handling for Thread-Local Storage", which is incorporated by reference into the x86-64 ABI.

In your disassembly at 0x4010d9, you can see the compiler trying to load from address fs:0x0 into rax, then adding -8 (the offset of t1 in the TLS block) and moving the result into rdi as the hidden this argument to Try1::Get(). Obviously since you have zeros at fs:0 instead, the resulting pointer is invalid and you get a crash when Try1::Get() reads val, which is really this->val.

I would write something like

void *fsbase = buf+4096;
_writefsbase_u64((u64)fsbase);
*(void **)fsbase = fsbase;

(Or memcpy(fsbase, &fsbase, sizeof(void *)) might be more compliant with strict aliasing.)

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82
  • Thank you I finally have documentation in my hands! I was really hoping this all would be simple. I *thought* I just needed a block of memory and put the address into fsbase. I thought the first thread local variable (-8) was reading/writing `buffer+4096-8` since I set `buffer+4096` to fsbase. I am very very wrong. I'll keep in mind that I should store the pointer to somewhere easy to access and that there are (non trivial?) abi requirements – Eric Stotch Oct 18 '21 at 21:28
  • 2
    That's right, your data is at `(Try1*)((u64)fsbasePtr - 8)`. So to compute that address, the compiler needs to know the value of `fsbasePtr`. You know it is equal to `buffer+4096`, but the compiler doesn't. And short of `rdfsbase` it has no way to retrieve the value of `fsbasePtr`. Instructions like `mov reg, fs:[offset]` can *load* from the thread-local storage block, but they can't tell us its linear address (and no, `lea reg, fs:[offset]` won't work; it'll just return `offset`). – Nate Eldredge Oct 18 '21 at 21:59
  • 1
    So the value of `fsbasePtr` needs to be stored in memory at some canonical location where the compiler will know where to find it, and what better place than within the thread-local storage block itself? Thus the ABI requires that address to be stored at offset 0 in the TLS block, so that `mov rax, fs:0x0` will load `rax` with the correct value of `fsbasePtr`. So whoever sets up the TLS block is responsible for storing it there - i.e. you. – Nate Eldredge Oct 18 '21 at 22:01
  • 1
    This added complexity is the price we pay for using `fs` to access thread-local storage. It would be a lot simpler if we just kept the TLS pointer in some general-purpose register, say `r15`; then getting the `this` address would be as simple as `lea rdi. [r15-8]`, and loads and stores would be easy too. Other architectures do that. But we would then give up the ability to use `r15` for anything else, and x86-64 hasn't got all that many to spare. Hence this "hack" of using `fs`. – Nate Eldredge Oct 18 '21 at 22:08
  • 1
    (I guess the hack really came from x86-32, which is even more short-handed in terms of registers. And there reading the segment base is harder still, because it's not even in a hidden register, but in the local descriptor table maintained by the kernel.) – Nate Eldredge Oct 18 '21 at 22:10
  • Wow thanks it all worked after that. I had to manually call some constructors that use to use constexpr but after that my 2K lined test lib worked – Eric Stotch Oct 18 '21 at 22:19
  • I still have no idea why `lea reg, fs:[offset]` doesn't return the address but that's a thing to learn another day. It's not important why it doesn't work. I just thought it did and didn't understand why my code was broken. I misread the assembly. I'm still fairly new to it and thought that mov was a lea with different syntax :facepalm: – Eric Stotch Oct 18 '21 at 22:21
  • I think asking for more solution is pushing my luck and I think I might run out of luck today. But do you happen to know how I can grab the data for constant gloabal variables or for thread local? I was thinking just hacking a script together that manually changes the bytes of my assembly but it would be brittle and not a good idea. BUT I dont need to do that for this project so I might not need a solution at all – Eric Stotch Oct 18 '21 at 22:37
  • 1
    Oh, I see. It doesn't help that the gas Intel format lets you omit the usual square brackets, and that the disassembler chooses to write it that way. `mov rax, fs:[0x0]` would have made it clearer. – Nate Eldredge Oct 18 '21 at 22:42
  • 1
    @EricStotch: I'm not sure what you mean by "grab the data". I don't know off the top of my head how initialization of thread-local variables is handled; probably copied by the startup code from some location specified via ELF headers. And then the thread library must have to do it again when each new thread is spun off. This should probably go to a new question. – Nate Eldredge Oct 18 '21 at 22:43
  • Yep definately. Thanks you for all your help. I'll write the question when I actually need the solution. Right now what I have is good enough since all my code (except maybe 2 variables) are zero initalized – Eric Stotch Oct 18 '21 at 22:47
  • 1
    @EricStotch: `lea reg, fs:[offset]` doesn't do anything with the segment-base because "effective address" is by definition just the offset part of a seg:off address that results from an addressing mode like `[base + idx*scale + disp]`. That's what makes it usable for address-math in a segmented memory model where all your segments (including DS) have non-zero bases (you don't want to add the seg_base before using with DS). [How is effective address calculated with fs and gs registers](https://stackoverflow.com/q/59797987) / [What is an effective address?](https://stackoverflow.com/q/36704481) – Peter Cordes Oct 18 '21 at 22:54