11

For various purposes, I am trying to obtain the address of the ELF header of the main executable without parsing /proc/self/maps. I have tried parsing the link_list chain given by dlopen/dlinfo functions but they do not contain an entry where l_addr points to the base address of the main executable. Is there any way to do this (Standard or not) without parsing /proc/self/maps?

An example of what I'm trying to do:

#include <stdio.h>
#include <elf.h>
int main()
{
    Elf32_Ehdr* header = /* Somehow obtain the address of the ELF header of this program */;
    printf("%p\n", header);
    /* Read the header and do stuff, etc */
    return 0;
}
小太郎
  • 5,510
  • 6
  • 37
  • 48

2 Answers2

22

The void * pointer returned by dlopen(0, RTLD_LAZY) gives you a struct link_map *, that corresponds to the main executable.

Calling dl_iterate_phdr also returns the entry for the main executable on the very first execution of callback.

You are likely confused by the fact that .l_addr == 0 in the link map, and that dlpi_addr == 0 when using dl_iterate_phdr.

This is happening, because l_addr (and dlpi_addr) don't actually record the load address of an ELF image. Rather, they record the relocation that has been applied to that image.

Usually the main executable is built to load at 0x400000 (for x86_64 Linux) or at 0x08048000 (for ix86 Linux), and are loaded at that same address (i.e. they are not relocated).

But if you link your executable with -pie flag, then it will be linked-at 0x0, and it will be relocated to some other address.

So how do you get to the ELF header?

2023 Update:

Isn't a simpler method (if relying on undocumented details), just to call dladdr on the l_ld address in the struct link_map, and then use dli_fbase out of that? – Simon Kissane

Indeed it is. Here is much simpler solution:

#define _GNU_SOURCE
#include <dlfcn.h>
#include <link.h>
#include <stdio.h>

int main()
{
  void *dyn = _DYNAMIC;
  Dl_info info;
  if (dladdr(dyn, &info) != 0) {
    printf("a.out loaded at %p\n", info.dli_fbase);
  }
  return 0;
}
gcc -g -Wall -Wextra x.c -ldl && ./a.out
a.out loaded at 0x556433ea0000  # high address here because my GCC defaults to PIE.

gcc -g -Wall -Wextra x.c -ldl -no-pie && ./a.out
a.out loaded at 0x400000

gcc -g -Wall -Wextra x.c -ldl -no-pie -m32 && ./a.out
a.out loaded at 0x8048000

Original 2012 solution:

#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif

#include <link.h>
#include <stdio.h>
#include <stdlib.h>

static int
callback(struct dl_phdr_info *info, size_t size, void *data)
{
  int j;
  static int once = 0;

  if (once) return 0;
  once = 1;

  printf("relocation: 0x%lx\n", (long)info->dlpi_addr);

  for (j = 0; j < info->dlpi_phnum; j++) {
    if (info->dlpi_phdr[j].p_type == PT_LOAD) {
      printf("a.out loaded at %p\n",
             (void *) (info->dlpi_addr + info->dlpi_phdr[j].p_vaddr));
      break;
    }
  }
  return 0;
}

int
main(int argc, char *argv[])
{
  dl_iterate_phdr(callback, NULL);
  exit(EXIT_SUCCESS);
}

    
$ gcc -m32 t.c && ./a.out
relocation: 0x0
a.out loaded at 0x8048000

$ gcc -m64 t.c && ./a.out
relocation: 0x0
a.out loaded at 0x400000

$ gcc -m32 -pie -fPIC t.c && ./a.out
relocation: 0xf7789000
a.out loaded at 0xf7789000

$ gcc -m64 -pie -fPIC t.c && ./a.out
relocation: 0x7f3824964000
a.out loaded at 0x7f3824964000

Update:

Why does the man page say "base address" and not relocation?

It's a bug ;-)

I am guessing that the man page was written long before prelink and pie, and ASLR existed. Without prelink, shared libraries are always linked to load at address 0x0, and then relocation and base address become one and the same.

how come dlpi_name points to an empty string when info refers to the main executable?

It's an accident of implementation.

The way this works, is that the kernel open(2)s the executable and passes the open file descriptor to the loader (in the auxv[] vector, as AT_EXECFD). Everything the loader knows about the executable it gets by reading that file descriptor.

There is no easy way on UNIX to map a file descriptor back to the name it was opened as. For one thing, UNIX supports hard-links, and there could be multiple filenames that refer to the same file.

Newer Linux kernels also pass in the name that was used to execve(2) the executable (also in auxv[], as AT_EXECFN). But that is optional, and even when it is passed in, glibc doesn't put it into .l_name / dlpi_name in order to not break existing programs which became dependent on the name being empty.

Instead, glibc saves that name in __progname and __progname_full.

The loader coud readlink(2) the name from /proc/self/exe on systems that didn't use AT_EXECFN, but the /proc file system is not guaranteed to be mounted either, so that would still leave it with an empty name sometimes.

Employed Russian
  • 199,314
  • 34
  • 295
  • 362
  • Yea I guess I was confused then. But I'm wondering, where in the manpage says that `l_addr` or `dlpi_addr` is the relocated address? All the manpages I've read just says "base address" – 小太郎 Jan 16 '12 at 07:55
  • Also, how come `dlpi_name` points to an empty string when `info` refers to the main executable? Shouldn't it contain the name of the main executable? – 小太郎 Jan 16 '12 at 08:37
  • I've updated the answer. You get 3 answers for the price of 1 ;-) – Employed Russian Jan 16 '12 at 14:00
  • 1
    Isn't a simpler method (if relying on undocumented details), just to call `dladdr` on the `l_ld` address in the `struct link_map`, and then use `dli_fbase` out of that? – Simon Kissane Aug 25 '22 at 10:51
1

There is the glibc dl_iterate_phdr() function. I'm not sure it gives you exactly what you want, but that is as close as I know:

"The dl_iterate_phdr() function allows an application to inquire at run time to find out which shared objects it has loaded." http://linux.die.net/man/3/dl_iterate_phdr

gby
  • 14,900
  • 40
  • 57
  • It gets all the shared objects that the program has loaded, which I can already do by going through the link_list chain, and that is probably what it does in the function. But I want the base address of the application itself, not the shared objects it has loaded. – 小太郎 Jan 16 '12 at 06:49
  • Have you tested that it does not return the application itself? – Simon Richter Jan 16 '12 at 07:14
  • The application executable is actually one of the loaded shared objects. I agree it's not clear from the man page or my answer :-) – gby Jan 16 '12 at 09:46