5

I've read in a few places that ASLR is supposed to load the .data section at random addresses each time a program is run, which means the addresses of global variables should be different. However, if I have the following code:

int global_var = 42;

int main()
{
    global_var = 10;
    return 0;
}

and I compile it with gcc -fpie -o global global.c, objdump -d -M intel shows the following:

  4004ed:   55                      push   rbp
  4004ee:   48 89 e5                mov    rbp,rsp
  4004f1:   c7 05 3d 0b 20 00 0a    mov    DWORD PTR [rip+0x200b3d],0xa        # 601038 <global_var>

It appears that global_var will always be placed at 601038. Indeed, if I compile with debugging symbols, global_var's DIE has that address hardcoded:

$ gcc -ggdb3 -fpie -o global global.c
$ objdump --dwarf=info global
...
<1><55>: Abbrev Number: 4 (DW_TAG_variable)
   <56>   DW_AT_name        : (indirect string, offset: 0x30c): global_var  
   <5a>   DW_AT_decl_file   : 1 
   <5b>   DW_AT_decl_line   : 1 
   <5c>   DW_AT_type        : <0x4e>    
   <60>   DW_AT_external    : 1 
   <60>   DW_AT_location    : 9 byte block: 3 38 10 60 0 0 0 0 0    (DW_OP_addr: 601038)

How does ASLR work in these cases?

Martin
  • 940
  • 6
  • 26
  • 1
    I believe ASLR only works on the code, stack and heap. Not necessarily on the data. Could be wrong, so not putting this as an answer. – Ricky Mutschlechner Apr 20 '16 at 19:12
  • Did you try running and debugging with `gdb`? I'm not sure, but maybe there is some kind of relocation done there. – Tomer Apr 20 '16 at 20:10
  • @RickyMutschlechner: The distance between code (.text) and data (.data) is a link-time constant; that's [how RIP-relative addressing can work](https://stackoverflow.com/questions/56262889/why-are-global-variables-in-x86-64-accessed-relative-to-the-instruction-pointer) (or 32-bit PIC adding offsets to GOT). In a non-PIE, the code and BSS can't be ASLRed either, only the stack and mmap allocations. All static storage is loaded at the addresses chosen by the linker at static link time, in a non-PIE. – Peter Cordes Dec 06 '21 at 20:14

2 Answers2

7

The instruction output from the disassembly is giving you 601038 as a convenience relative to an arbitrary base (0x400000), but read the actual instruction; it's writing to DWORD PTR [rip+0x200b3d]. rip is the instruction pointer. The code and data is at a fixed offset relative to each other; randomizing the base address doesn't change that. By loading using the instruction pointer, it's using an address that incorporates the ASLR relocation already.

The convenience mapping in the description to 601038 is because the fixed offsets from rip scattered throughout the code are all dependent on where the instruction is located, so they're not comparable without making an adjustment for the instruction location; the disassembler knows the instruction offset though, so it can subtract that instruction offset for you to get globally comparable addresses for the common 0x400000 base.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • Yes, but that happens even without `-fpie`. I guess the compiler is being safe just in case? – Martin Apr 20 '16 at 20:15
  • @Martin: RIP-relative addressing is the most efficient way to access static data on x86-64, so yes, compilers use that even with `-fno-pie`. [Why are global variables in x86-64 accessed relative to the instruction pointer?](https://stackoverflow.com/q/56262889) But that *will* get them to use 5-byte `mov esi, offset foo` to put static addresses into registers, e.g. as args for puts, instead of 7-byte `lea rsi, [rip + foo]` like they'd need to use to make it safe to link with `-pie` (a linker option separate from `-fPIE`.) – Peter Cordes Dec 06 '21 at 20:16
5

When you compile a PIE, the file is in fact technically a shared object (ET_DYN, you can check this with readelf -h filename). This type of ELF files (both PIEs and .so files) are designed to be loadable at any base address (well, usually modulo the page size).

For those files, the virtual addresses (given in the section header table, program header table, symbol table, in the DWARF DIEs, etc.) are offsets from this base address.

This is explained in the System V ABI:

the virtual addresses in the program headers might not represent the actual virtual addresses of the program’s memory image. Executable files typically contain absolute code. [...] On the other hand, shared object segments typically contain position-independent code. This lets a segment’s virtual address change from one process to another, without invalidating execution behavior. Though the system chooses virtual addresses for individual processes, it maintains the segments’ relative positions Because position-independent code uses relative addressing between segments, the difference between virtual addresses in memory must match the difference between virtual addresses in the file. The difference between the virtual address of any segment in memory and the corresponding virtual address in the file is thus a single constant value for any one executable or shared object in a given process. This difference is the base address.

For DWARF, this is explained in section 7.3 of DWARF 4:

The relocated addresses in the debugging information for an executable object are virtual addresses and the relocated addresses in the debugging information for a shared object are offsets relative to the start of the lowest region of memory loaded from that shared object.

As those files can be mapped at any base address, this base address can be randomized.

ysdx
  • 8,889
  • 1
  • 38
  • 51
  • But what happens with the actual code? In the example I had `mov DWORD PTR [rip+0x200b3d],0xa`, which depends on the value of `rip`. This means that the "random" base addresses of `.data` and `.text` are somehow related, right? Could the generated code ever refer to a global variable in terms of just an address (as opposed to a register + offset)? If so, would it break with ASLR on? – Martin Apr 20 '16 at 20:12
  • Yes the relative position of the different parts of a given ELF file is fixed. When you compile PIE, the compiler only uses "relative addressing" instead of absolute addresses. That's the core idea of PIC/PIE (position independent). – ysdx Apr 20 '16 at 20:21
  • 1
    Oh, I see. I thought each section had an independent base address. Still, could the generated code ever refer to a global variable in terms of just an address (as opposed to a register + offset)? If so, would it break with ASLR on? – Martin Apr 20 '16 at 20:23
  • This would happen if you compile without `-fpie`. In this case, on the top of my head, the link editor might will fail to link properly or generate text relocations (i.e. patch the instruction when loading the file). – ysdx Apr 20 '16 at 20:26