3

I try to compile a non-PIC code with gcc and I noticed that the assembly code generated by GCC does not use a pure function address to call but adds to it a weird offset.

I use GCC 9.3.0 as gcc test.c -o test-nopic -mcmodel=large -no-pie -O0 with the following code. I left out -fPIC.

#include <stdio.h>

int var1 = 1;
int var2 = 2;

void putstr(int* ptr) {
    printf("val: %d\n", *ptr);
}

int main() {
    putstr(&var1);
    putstr(&var2);
}

Here is listing with the code of main() from objdump -wdrC -M intel test-nopic.

000000000040117e <main>:
  40117e:       55                      push   rbp
  40117f:       48 89 e5                mov    rbp,rsp
  401182:       53                      push   rbx
  401183:       48 83 ec 08             sub    rsp,0x8
  401187:       48 8d 1d f9 ff ff ff    lea    rbx,[rip+0xfffffffffffffff9]        # 401187 <main+0x9>
  40118e:       49 bb 79 2e 00 00 00 00 00 00   movabs r11,0x2e79
  401198:       4c 01 db                add    rbx,r11
  40119b:       48 b8 30 00 00 00 00 00 00 00   movabs rax,0x30
  4011a5:       48 8d 3c 03             lea    rdi,[rbx+rax*1]
  4011a9:       48 b8 26 d1 ff ff ff ff ff ff   movabs rax,0xffffffffffffd126
  4011b3:       48 8d 04 03             lea    rax,[rbx+rax*1]
  4011b7:       ff d0                   call   rax
  4011b9:       48 b8 34 00 00 00 00 00 00 00   movabs rax,0x34
  4011c3:       48 8d 3c 03             lea    rdi,[rbx+rax*1]
  4011c7:       48 b8 26 d1 ff ff ff ff ff ff   movabs rax,0xffffffffffffd126
  4011d1:       48 8d 04 03             lea    rax,[rbx+rax*1]
  4011d5:       ff d0                   call   rax
  4011d7:       b8 00 00 00 00          mov    eax,0x0
  4011dc:       48 83 c4 08             add    rsp,0x8
  4011e0:       5b                      pop    rbx
  4011e1:       5d                      pop    rbp
  4011e2:       c3                      ret

The address of pustr(int*) is 0x401126. readelf -l test-nopic shows the file type is EXEC and the following headers:

  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x0000000000000268 0x0000000000000268  R      0x8
  INTERP         0x00000000000002a8 0x00000000004002a8 0x00000000004002a8
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000004d8 0x00000000000004d8  R      0x1000
  LOAD           0x0000000000001000 0x0000000000401000 0x0000000000401000
                 0x0000000000000275 0x0000000000000275  R E    0x1000
  LOAD           0x0000000000002000 0x0000000000402000 0x0000000000402000
                 0x0000000000000168 0x0000000000000168  R      0x1000
  LOAD           0x0000000000002e00 0x0000000000403e00 0x0000000000403e00
                 0x0000000000000238 0x0000000000000240  RW     0x1000
  DYNAMIC        0x0000000000002e10 0x0000000000403e10 0x0000000000403e10
                 0x00000000000001d0 0x00000000000001d0  RW     0x8
  NOTE           0x00000000000002c4 0x00000000004002c4 0x00000000004002c4
                 0x0000000000000044 0x0000000000000044  R      0x4
  GNU_EH_FRAME   0x0000000000002010 0x0000000000402010 0x0000000000402010
                 0x0000000000000044 0x0000000000000044  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x0000000000002e00 0x0000000000403e00 0x0000000000403e00
                 0x0000000000000200 0x0000000000000200  R      0x1
  1. Why gcc does not use just movabs rax, 0x401126 before the calls?
  2. Why the address(?) used at 4011a9 is filled with all these 0xFFs?
  3. Why gcc uses the rip register with added the weird offset and then use the magic value 0x2e79 which does not fit to any of the segments listed above.
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
kptnbmb
  • 91
  • 6
  • 6
    You need to add `-fno-pic`. `-no-pie` only sets the output format. The `0xffff` stuff is just negative numbers in 2's complement. You can use `gcc -S` to get more readable assembly directly from the compiler. – Jester May 24 '20 at 12:57
  • If I compile the code on [godbolt](https://godbolt.org/z/MhELbW) with only `-mcmodel=large`, I get the non-pic code with simple `movabs`, and if I add `-fpic` I get similar code to yours. But on my local Linux box (Ubuntu 19.10), I get the same behavior as you, with the pic code even without `-fpic`. So it must be that both of our systems use `-fpic` or some similar option by default, and that explains why we're seeing the pic code even when we don't use `-fpic`. Though I don't see `-fpic` showing up anywhere in the output of `gcc -v`, nor do I know where this default would be set. – Nate Eldredge May 24 '20 at 16:45
  • What system are you running on? Is it a gcc install that came with the OS or its package manager, or did you build it yourself? If the latter, what configuration options did you use? – Nate Eldredge May 24 '20 at 16:47
  • 2
    `gcc -v` shows `-enable-default-pie` which probably turns on the pic as well. PIE needs to be PIC, but PIC doesn't have to be PIE :) – Jester May 24 '20 at 16:54
  • @Jester: Aha, I bet that's it. I also found that if you compile with `-Q` you get the full list of options in effect, and `-fPIC` is clearly visible there. Would you like to post an answer, or shall I? – Nate Eldredge May 24 '20 at 17:06
  • 1
    @Jester Thanks for your comment. I wrote an answer with a quick explanation. @NateEldredge I also have ```-enable-default-pie``` in my gcc configuration on Manjaro (```gcc (Arch Linux 9.3.0-1) 9.3.0```). – kptnbmb May 24 '20 at 17:29

1 Answers1

2

With @Jester's comment I solved the mystery. I have to compile also with -fno-pic flag to disable PIE code-gen which is on by default in most modern GNU/Linux distros. -no-pie is only a linker option, -fno-pic or -fno-pie are code-gen options. See 32-bit absolute addresses no longer allowed in x86-64 Linux?

The code from the question (compiled with -mcmodel=large -no-pie -O0) uses a call to an absolute address taken from rax register. The address is computed from rip register with the following code.

  401187:       48 8d 1d f9 ff ff ff    lea    rbx,[rip+0xfffffffffffffff9]        # 401187 <main+0x9>
  40118e:       49 bb 79 2e 00 00 00 00 00 00   movabs r11,0x2e79
  401198:       4c 01 db                add    rbx,r11
  40119b:       48 b8 30 00 00 00 00 00 00 00   movabs rax,0x30
  4011a5:       48 8d 3c 03             lea    rdi,[rbx+rax*1]
  4011a9:       48 b8 26 d1 ff ff ff ff ff ff   movabs rax,0xffffffffffffd126
  4011b3:       48 8d 04 03             lea    rax,[rbx+rax*1]
  4011b7:       ff d0                   call   rax

I computed the address stored in rip and it looks to point to 0x40118e. It is used to compute addresses of the function and its argument (an address of var1 is stored in rdi register, it points to the RW LOAD segment). With -fno-pic flag the function call looks as I desired.

  40115c:       48 bf 30 40 40 00 00 00 00 00   movabs rdi,0x404030
  401166:       48 b8 26 11 40 00 00 00 00 00   movabs rax,0x401126
  401170:       ff d0                   call   rax

In the default code model (not large):

Without -mcmodel=large flag (-no-pie -fno-pic -O0) it looks different. Static data and code are reachable with a 32-bit relative displacement, or even a 32-bit absolute in non-PIE code. This is much more efficient, especially for code; avoid -mcmodel=large whenever possible. Use -mcmodel=medium if you just need some huge static arrays.

Here is a call in the relative version: for position-dependent code it can put static addresses into registers with an efficient mov r32, imm32 (How to load address of function or label into register in GNU Assembler)

  401150:       bf 30 40 40 00          mov    edi,0x404030
  401155:       e8 cc ff ff ff          call   401126 <putstr>

Here is a code with just -fpie (enabled by default in my configuration).

    1165:       48 8d 3d c4 2e 00 00    lea    rdi,[rip+0x2ec4]        # 4030 <var1>
    116c:       e8 c8 ff ff ff          call   1139 <putstr>

And after adding -fpic flag to also enable symbol-interposition for global functions, like a shared library: No real difference after linking, just an extra unnecessary mov instead of putting the arg into rdi in the first place. (This is an artifact of -O0: compile fast, not well)

    1165:       48 8d 05 c4 2e 00 00    lea    rax,[rip+0x2ec4]        # 4030 <var1>
    116c:       48 89 c7                mov    rdi,rax
    116f:       e8 c5 ff ff ff          call   1139 <putstr>

gcc -O0 happens to avoid the extra mov if we declare var1 as static (What does “static” mean in C?). Or more simply, enable optimization with at least -Og, more usually -O2 or -O3. Un-optimized code is full of wasted instructions.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
kptnbmb
  • 91
  • 6
  • You don't need to change the source, just enable optimization. Un-optimized compiler output is full of wasted instructions. – Peter Cordes May 24 '20 at 20:44
  • Yes, you are sure. I didn't mention about this to be consistent because the rest of code is compiled with `-O0`. – kptnbmb May 25 '20 at 14:20
  • 1
    Right, but it's bad advice to even think about changing your source code to micro-optimize the `-O0` asm output. If you care about the quality of the asm, simply don't use `-O0`. Differences like that will go away. (It's not a bad thing to make variables `static` when possible, though, but there's no reason to expect any benefit in general. Except with `-fPIC` for a shared library; avoiding the indirection that supports symbol interposition is good.) – Peter Cordes May 25 '20 at 19:48