0

So I've made an ELF64 executable file with my own compiler/linker. It is very basic, only has 1 dependency for libc.puts, thus one entry in the symbol table and one relocation entry.

If I run it with the linker explicitly, it works just fine and prints the letter A: /lib64/ld-linux-x86-64.so.2 ./o

If I run it by itself: ./o I get a sigfault in _dl_relocate_object() at dl-reloc.c:232, which in my version of Ubuntu 16.04 is:

    /* Do the actual relocation of the object's GOT and other data.  */
    
    /* String table object symbols.  */
    const char *strtab = (const void *) D_PTR (l, l_info[DT_STRTAB]);

Here is the output of readelf:

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x406000
  Start of program headers:          12672 (bytes into file)
  Start of section headers:          0 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         7
  Size of section headers:           0 (bytes)
  Number of section headers:         0
  Section header string table index: 0

There are no sections in this file.

There are no sections to group in this file.

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x003180 0x0000000000408180 0x0000000000408180 0x000188 0x000188 R   0x8
  INTERP         0x003118 0x0000000000408118 0x0000000000408118 0x00001c 0x00001c R   0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x001000 0x0000000000405000 0x0000000000405000 0x000038 0x000038 RW  0x1000
  LOAD           0x002000 0x0000000000406000 0x0000000000406000 0x000038 0x000038 R E 0x1000
  LOAD           0x003000 0x0000000000407000 0x0000000000407000 0x000074 0x000080 RW  0x1000
  LOAD           0x003078 0x0000000000408078 0x0000000000408078 0x000290 0x000290 RW  0x1000
  DYNAMIC        0x003078 0x0000000000408078 0x0000000000408078 0x000290 0x000290 RW  0x1000

Dynamic section at offset 0x3078 contains 9 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x0000000000000005 (STRTAB)             0x408108
 0x0000000000000006 (SYMTAB)             0x408138
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000007 (RELA)               0x408168
 0x0000000000000008 (RELASZ)             24 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000000000000a (STRSZ)              16 (bytes)
 0x0000000000000000 (NULL)               0x0

There are no relocations in this file.

The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported.

Dynamic symbol information is not available for displaying symbols.

No version information found in this file.

So, what's wrong with my file, and how can i get it to run without prepending the linker name on the command line?

EDIT: My ELF file requests PT_DYNAMIC to be loaded at virtual address 0x408078.

When run as /lib64/ld-linux-x86-64.so.2 ./o the directive is followed, and PT_DYNAMIC is loaded at 0x408078. However, when run as ./o the PT_DYNAMIC gets loaded at 0x407078. It happens that the preceeding segment (which is a variables segment) ends at 0x407079H (starts at 0x407000 and spans 0x80 bytes), thus writing 0x0 into 0x407078 and 0x407079. These two zero bytes override the DT_NEEDED tag that the PT_DYNAMIC segment starts with. Consequently, the dynamic loader thinks that PT_DYNAMIC is empty, and cannot find any tags. In particular, it cannot find the DT_STRTAB tag, which caused the trap that I described in my question.

Interestingly, it turns out that the actual and not corrupted copy of DT_DYNAMIC is in fact present at 0x408078, as the PHT directs.

Here's how I figured this all out:

  1. gdb ./o

  2. break _dl_relocate_object

  3. r (when run with linker name prepended to command line, it first stops at the breakpoint while loading the linker itself, so I had to cont; then the 2nd time it stops is for my file)

after it stops at the breakpoint:

  1. info registers rdi ; this is the function's first argument struct link_map* l

  2. x/8ag <rdi value> ;see what the values of link_map members are link_map is defined as

  {
    /* These first few members are part of the protocol with the debugger.
       This is the same format used in SVR4.  */

    ElfW(Addr) l_addr;          /* Difference between the address in the ELF
                                   file and the addresses in memory.  */
    char *l_name;               /* Absolute file name object was found in.  */
    ElfW(Dyn) *l_ld;            /* Dynamic section of the shared object.  */
    struct link_map *l_next, *l_prev; /* Chain of loaded objects.  */
  };

Thus, the third printed quadword (octabyte) is the member l_ld. When started without the dynamic loader's name on the commandline, l_ld = 0x407078. When started with the dynamic loader's name prepended to the command line ("gdb --args /lib64/ld-linux-x86-64.so.2 ./o"), it shows l_ld = 0x408078.

Why the difference?

  1. It is easy to see the corrunpt PT_DYNAMIC values: x/18xg 0x407078

Also, even when the loader's name is prepended to the command line, and l_ld is correct, and PT_DYNAMIC is not corrupt at 0x408078 - there still is a copy of it at 0x407078, and it is not corrupt either.

So how do i get it to work and load my segments properly?

It's interesting that link_map's first member, l_addr, is -4096 (-0x1000), which is exactly the difference between 0x407078 and 0x408078. So it looks like the loader (dynamic or static?) makes a deliberate decision at some point to load the segment at an address dirrefent from what the ELF file is requesting. Why does it?

adimetrius
  • 91
  • 9
  • Why are you loading at 0x408078 instead of 0x408000? – user123 Nov 14 '20 at 19:57
  • Honestly, I've never studied the dynamic part of the ELF specification. As to what I know, your code need to be on 4KB boundaries to avoid requiring multiple pages for your code. Basically, the Ubuntu elf loader expects code to be on 4KB boundaries so it loads your code improperly and can have weird behavior if it isn't. When you run an ELF file, Linux sets up a new process. The process will have it's own full virtual address space. If your code isn't on a 4KB boundaries, Linux will need to set up multiple pages in its page tables for the same code. – user123 Nov 14 '20 at 20:02
  • Another thing is the p_offset member for your dynamic segment. Are you sure it is not "over" the segment just before? Because it leaves only 0x78 bytes for the third LOAD segment's code. Linux isn't made to tcheck all mistakes users can do while implementing the ELF spec. It expects well formatted ELF files. So it's possible that misalignment of code has weird behavior. – user123 Nov 14 '20 at 20:10
  • I'm loading at 0x408078 because the file offset of the segment is 3078; it is required that the last three digits of the offset and address are equal. – adimetrius Nov 17 '20 at 00:05
  • yes, p_offset is exact and correct, the previous data is 0x78 bytes long – adimetrius Nov 17 '20 at 00:06
  • I have finally got it to work by skipping extra bytes in the file, making offset 0x4000, and loading at address 0x409000. I still don't see from the specification why it works (and why the other layout didn't). – adimetrius Nov 17 '20 at 00:08

1 Answers1

0

As stated on the link Required alignment of .text versus .data, the ELF spec requires some alignment.

It is stated in the ELF spec (program header part) that

Loadable process segments must have congruent values for p_vaddr and p_offset, modulo the page size.This member gives the value to which the segments are aligned in memory and in the file. Values 0 and 1 mean that no alignment is required. Otherwise, p_align should be a positive, integral power of 2, and p_addr should equal p_offset, modulo p_align.

You seem to have compiled your elf file for x64 processors. x86-64 processors don't look at the last 12 bits of the virtual address to identify the page into physical memory.

What I think is the problem here is that Linux cannot map 0x80 bytes of a page to data and the rest of the page for code. Code and data pages don't have the same permissions. Data has R/W permission while code has R/E permission. Since Linux's code has some error management, it can determine that there is a problem so it attempts to fix it but fails doing so and it makes your program crash.

It is not for nothing that the ELF spec requires alignment. When you create an ELF file, you need to take for account the processor on which you execute the code. The x86-64 processors require p_align to be 4K simply because the page size is 4k. It is a physical constraint of this processor because the OS cannot apply different permissions to the same page because memory is "separated" in chunks of 4k.

The last 12 bits of the virtual address are used to identify the offset in the physical page in main memory. The OS is limited in mapping its page tables to respect the ELF file you built. It simply cannot do what you ask.

user123
  • 2,510
  • 2
  • 6
  • 20