3

My problem

I am trying to write a shared library(not an executable, so please do not tell me to use -no-pie) with assembly and C in separate files(not inline assembly).

And I would like to access a C global variable through Global Offset Table in assembly code, because the function called might be defined in any other shared libraries.

I know the PLT/GOT stuff but I do not know for sure how to tell the compiler to correctly generate relocation information for the linker(what is the syntax), and how to tell the linker to actually relocate my code with that information(what is the linker options).

My code compiles with a linking error

/bin/ld: tracer.o: relocation R_X86_64_PC32 against
/bin/ld: final link failed: bad value

Furthermore, it would be better if someone could share some detailed documentation on the GAS assembly about relocation. For example, an exhaustive list on how to interpolate between C and assembly with GNU assembler.

Source Code

Compile the C and assembly code and link the into ONE shared library.

# Makefile
liba.so: tracer2.S target2.c
    gcc -shared -g -o liba.so tracer2.S target2.c
// target2.c
// NOTE: This is a variable, not a function.
int (*read_original)(int fd, void *data, unsigned long size) = 0;
// tracer2.S
.text
    // external symbol declarition
    .global read_original
read:
  lea read_original(%rip), %rax
  mov (%rax), %rax
  jmp *%rax

Expectation and Result

I expect the linker to happily link my object files but it says

g++ -shared -g -o liba.so tracer2.o target2.c -ldl
/bin/ld: tracer.o: relocation R_X86_64_PC32 against
/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status
make: *** [Makefile:2: liba.so] Error 1

and commenting out the line

// lea read_original(%rip), %rax

makes the error disappear.

Solution.

    lea read_original@GOTPCREL(%rip), %rax

The keyword GOTPCREL will tell the compiler this is a PC-relative relocation to GOT table. The linker will calculate the offset from current rip to the target GOT table entry.

You can verify with

$ objdump -d liba.so
    10e9:       48 8d 05 f8 2e 00 00    lea    0x2ef8(%rip),%rax        # 3fe8 <read_original@@Base-0x40>
    10f0:       48 8b 00                mov    (%rax),%rax
    10f3:       ff e0                   jmpq   *%rax

Thanks to Peter.

Some information that might be related or not

1. I can call a C function with
  call read@plt

objdump shows it calls into the correct PLT entry.

$ objdump -d liba.so
...
0000000000001109 <read1>:
    1109:       e8 22 ff ff ff          callq  1030 <read@plt>
    110e:       ff e0                   jmpq   *%rax
2. I can lea a PLT entry address correctly

0xffffff23 is -0xdd, 0x1109 - 0xdd = 102c

0000000000001020 <.plt>:
    1020:       ff 35 e2 2f 00 00       pushq  0x2fe2(%rip)        # 4008 <_GLOBAL_OFFSET_TABLE_+0x8>
    1026:       ff 25 e4 2f 00 00       jmpq   *0x2fe4(%rip)        # 4010 <_GLOBAL_OFFSET_TABLE_+0x10>
    102c:       0f 1f 40 00             nopl   0x0(%rax)

0000000000001030 <read@plt>:
    1030:       ff 25 e2 2f 00 00       jmpq   *0x2fe2(%rip)        # 4018 <read@GLIBC_2.2.5>
    1036:       68 00 00 00 00          pushq  $0x0
    103b:       e9 e0 ff ff ff          jmpq   1020 <.plt>

0000000000001109 <read1>:
    1109:       48 8d 04 25 23 ff ff    lea    0xffffffffffffff23,%rax
    1110:       ff
    1111:       ff e0                   jmpq   *%rax

Environment

  • Arch Linux 20190809
$ uname -a
Linux alex-arch 5.2.6-arch1-1-ARCH #1 SMP PREEMPT Sun Aug 4 14:58:49 UTC 2019 x86_64 GNU/Linux
$ gcc -v
Using built-in specs.
COLLECT_GCC=/bin/gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/9.1.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared --enable-threads=posix --with-system-zlib --with-isl --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function --enable-multilib --disable-werror --enable-checking=release --enable-default-pie --enable-default-ssp --enable-cet=auto
Thread model: posix
gcc version 9.1.0 (GCC)
$ ld --version
GNU ld (GNU Binutils) 2.32
Copyright (C) 2019 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.
Alex Wang
  • 325
  • 2
  • 13
  • 1
    `.global read_original` doesn't define the symbol, it only marks it for export *from* this object if / when it is defined with a `read_original:` label. Shared libraries have to go through the GOT to access static data in the main executable or in another shared library because they can be loaded more than 2GB away, and because the offset isn't a compile-time constant. (Look at C compiler output for accessing `extern int read_original`.) Also, your LEA/JMP is equivalent to `jmp read_original`, but worse. Did you mean `mov` / `jmp` to load a function pointer? – Peter Cordes Aug 09 '19 at 04:28
  • That looks suspiciously like the beginnings of another illuminating @PeterCordes answer `:)` – David C. Rankin Aug 09 '19 at 04:30
  • Do you actually need your asm to be in a shared library *separate* from the C that defines the global? Normally you'd just link the `.c` and `.s` together into one shared library. In asm you can access globals within your own shared object with normal RIP-relative addressing. – Peter Cordes Aug 09 '19 at 04:31
  • Hi @PeterCordes,1. I do not think `.global read_original` defines the symbol ,ether. The symbol is defined in `target2.c`; 2. I know we need to go through GOT, and I'm asking how exactly to do that; 3. lea/jmp is just a demonstration of the problem, so please don't mind the time cost :) – Alex Wang Aug 09 '19 at 04:36
  • @PeterCordes My asm and C are in the same shared library `liba.so`, I wonder if you understand my problem correctly. I AM linking them in one shared library. And RIP-relative addressing is EXACTLY what I am trying to do. Please let me know if I haven't made it clear. – Alex Wang Aug 09 '19 at 04:41
  • If the symbol is defined in `target2.c` then no, you don't need to go through the GOT. You can access it directly. You're compiling with `g++` ; did you use `extern "C"` to disable C++ name-mangling so the asm name matches the C name? Why are you compiling a `.c` file with a C++ compiler? – Peter Cordes Aug 09 '19 at 04:44
  • (And BTW, yes I was fooled earlier by your `gcc -shared -c tracer2.s` command line. Even though you passed `-shared`, it isn't actually linking the `.s` into a separate shared library, only assembling into a `.o` so `-shared` does nothing there. And I hadn't notice the compiling-as-C++ problem so I assumed if your C did define the symbol you wouldn't have gotten the error if they were linked together into the same exe or .so) – Peter Cordes Aug 09 '19 at 04:46
  • @PeterCordes 1. Yes I don't have to use the GOT, but in this case I need to access the GOT because we allow `read_original` to be defined else where, e.g. some LD_PRELOAD'ed shared library. 2. I did use extern "C" and this is originally C++ code, but to make the shorter, and more easily posted on SO, I delete them and change to `.c`. Changing `g++` to `gcc` does not help. :) I update the Makefile to be less confusing – Alex Wang Aug 09 '19 at 04:50
  • Oh, apparently I'm mistaken; I thought `gcc` would have worked, but I tried it and reproduced your result. Maybe we need to mark the symbol's ELF visibility as "hidden" for the linker to be happy with accessing it directly. But you say you definitely need the symbol to support symbol-interposition? You didn't say that earlier. Put that in the question. See also https://www.macieira.org/blog/2012/01/sorry-state-of-dynamic-libraries-on-linux/ – Peter Cordes Aug 09 '19 at 04:55
  • @PeterCordes `lea read_original@GOTPCREL(%rip), %rax` works for me! But where can I find the documents about these magic `function_name@blabla` things? – Alex Wang Aug 09 '19 at 05:09

1 Answers1

1

Apparently the linker enforces global vs. hidden visibility for symbols in ELF shared objects, not allowing "back door" access to symbols that participate in symbol-interposition (and thus can potentially be more than 2GB away.)

To access it directly from other code in the same shared object with normal RIP-relative addressing, make the symbol hidden by setting its ELF visibility as such. (See also https://www.macieira.org/blog/2012/01/sorry-state-of-dynamic-libraries-on-linux/ and Ulrich Drepper's How to Write Shared Libraries)

__attribute__ ((visibility("hidden")))
 int (*read_original)(int fd, void *data, unsigned long size) = 0;

Then gcc -save-temps tracer2.S target2.c -shared -fPIC compiles/assembles + links a shared library. GCC also has options like -fvisibility=hidden that makes that the default, requiring explicit attributes on symbols you do want to export for dynamic linking. That's a very good idea if you have any globals that you use inside your library, to get the compiler to emit efficient code for using them. It also protects you from global name-clashes with other libraries. The GCC manuals strongly recommends it.

It also works with g++; C++ name mangling only applies to function names, not variables (including function-pointers). But generally don't compile .c files with a C++ compiler.


If you do want to support symbol interposition, you need to use the GOT; obviously you can just look at how the compiler does it:

int glob;                 // with default visibility = default
int foo() { return glob; }

compiles to this asm with GCC -O3 -fPIC (without any visibility options, so global symbols are fully globally visible: exported from shared objects and participating in symbol interposition).

foo:
        movq    glob@GOTPCREL(%rip), %rax
        movl    (%rax), %eax
        ret

Obviously this is less efficient than mov glob(%rip), %eax so prefer keeping your global vars scoped to the library (hidden), not truly global.

There are tricks you can do with weak aliases to let you export a symbol that only this library defines, and access that definition efficiently via a "hidden" alias.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Thanks Peter, `GOTPCREL` is exactly what I am asking! And I wonder where I can find an exhaustic list of these `glob@blabla` magics. Also thanks for your blog post. But I think in C/C++, experimenting yourself and make certain assumptions about the results is inviting undefined behaviors. So I don't think mimicking the compile output is a good idea. It is not an alternative for documentations and manuals. – Alex Wang Aug 09 '19 at 05:16
  • @AlexWang: It's not my blog *post*, just my *link* to Thiago's blog. But anyway, hopefully the syntax is documented in the GAS manual (https://sourceware.org/binutils/docs/as/). The details about why / how you'd use the GOT is maybe in the ABI doc? I'm not sure. The GOT holds addresses of symbols and the dynamic linker fills it in at run time so in this case you can just copy what the compiler does for a C function that does what you want. As long as you follow the calling convention / ABI and don't do anything a C compiler wouldn't (like write to the GOT), you shouldn't get any UB. – Peter Cordes Aug 09 '19 at 05:43
  • @AlexWang: TL:DR: if you know enough about how things work in the big picture, looking at compiler output is very useful to getting the syntax details right. – Peter Cordes Aug 09 '19 at 05:44
  • Yes, of course I agree with you. I just mean to say that it should not replace the homework for docs/manuals. – Alex Wang Aug 09 '19 at 05:58
  • 1
    g++ compiles the file as C++ not C, but that works ok as C++ name mangling only applies to function *names* and not to variable names, even if they have a function pointer type. – Chris Dodd Aug 09 '19 at 17:19