7

How does the linker find the main function in an x86-64 ELF-format executable?

RouteMapper
  • 2,484
  • 1
  • 26
  • 45

2 Answers2

3

A very generic overview, the linker assigns the address to the block of code identified by the symbol main. As it does for all the symbols in your object files.

Actually, it doesn't assign a real address but assigns an address relative to some base which will get translated to a real address by the loader when the program is executed.

The actual entry point is not likely main but some symbol in the crt that calls main. LD by default looks for the symbol start unless you specify something different.

The linked code ends up in the .text section of the executable and could look something like this (very simplified):

Address | Code
1000      someFunction
...
2000      start
2001        call 3000
...
3000      main
...

When the linker writes the ELF header it would specify the entry point as address 2000.

You can get the relative address of main by dumping the contents of the executable with something like objdump. To get the actual address at runtime you can just read the symbol funcptr ptr = main; where funcptr is defined as a pointer to a function with the signature of main.

typedef int (*funcptr)(int argc, char* argv[]);

int main(int argc, char* argv[])
{
    funcptr ptr = main;
    printf("%p\n", ptr);
    return 0;
}

The address of main will be correctly resolved regardless if symbols have been stripped since the linker will first resolve the symbol main to its relative address.

Use objdump like this:

$ objdump -f funcptr.exe 

funcptr.exe:     file format pei-i386
architecture: i386, flags 0x0000013a:
EXEC_P, HAS_DEBUG, HAS_SYMS, HAS_LOCALS, D_PAGED
start address 0x00401000

Looking for main specifically, on my machine I get this:

$ objdump -D funcptr.exe | grep main
  40102c:       e8 af 01 00 00          call   4011e0 <_cygwin_premain0>
  401048:       e8 a3 01 00 00          call   4011f0 <_cygwin_premain1>
  401064:       e8 97 01 00 00          call   401200 <_cygwin_premain2>
  401080:       e8 8b 01 00 00          call   401210 <_cygwin_premain3>
00401170 <_main>:
  401179:       e8 a2 00 00 00          call   401220 <___main>
004011e0 <_cygwin_premain0>:
004011f0 <_cygwin_premain1>:
00401200 <_cygwin_premain2>:
00401210 <_cygwin_premain3>:
00401220 <___main>:

Note that I am on Windows using Cygwin so your results will differ slightly. It looks like main lives at 00401170 for me.

Dave Rager
  • 8,002
  • 3
  • 33
  • 52
  • So there's no way to determine what the address of `main` will be before runtime? – RouteMapper Jul 17 '13 at 20:01
  • I see your edited post now. I understand that `start`'s address is in the ELF header. But is there any way to statically compute the address of `main`? Sometimes the address for main is dynamically relocated. – RouteMapper Jul 17 '13 at 20:11
  • @RouteMapper you're doing static analysis, right? Relocation does not exist then. It's the loader's job to relocate, and *you're the loader*, just decide not to relocate. – harold Jul 17 '13 at 20:13
  • You can get the relative address of `main` by dumping the contents of the exe with something like `objdump`. To get the actual address at runtime you can just read the symbol `funcptr ptr = main` where `funcptr` is defined as a pointer to a function with the signature of `main`. – Dave Rager Jul 17 '13 at 20:17
  • Suppose it's a stripped executable. That symbol information no longer exists. How then can I find the address of that function which `start` calls (i.e. `main`')? – RouteMapper Jul 17 '13 at 20:20
  • I'm not sure how to determine the relative address of `main`. What function in `objdump` do you use find it? – RouteMapper Jul 17 '13 at 20:46
  • @DaveRager - That only works if the executable isn't stripped, right? – RouteMapper Jul 17 '13 at 21:06
  • Yes. If the executable is stripped you will have to do it the hard way starting at the entry point (which you can find) and tracing call statements until you find what seems to be your main function. – Dave Rager Jul 17 '13 at 21:09
  • I did trace it, but it gets to the GOT, which hasn't been initialized. As such, I can't determine where `main` is. – RouteMapper Jul 17 '13 at 21:32
2

On Binutils, it is determined by either:

  • -e CLI option
  • linker script

You can view your linker script with:

ld --verbose

Mine contains:

ENTRY(_start)

Then at link time, glibc provided object files like crt1.o that contain the _start symbol are passed to the linker together with your main.o.

Those object files do some setup for you like argv, and then call your main function.

You can see those extra object files being sneaked in with gcc -v.

This is documented at: https://sourceware.org/binutils/docs/ld/Entry-Point.html#Entry-Point

The first instruction to execute in a program is called the entry point. You can use the ENTRY linker script command to set the entry point. The argument is a symbol name:

 ENTRY(symbol)

There are several ways to set the entry point. The linker will set the entry point by trying each of the following methods in order, and stopping when one of them succeeds:

  • the `-e' entry command-line option;
  • the ENTRY(symbol) command in a linker script;
  • the value of a target specific symbol, if it is defined; For many targets this is start, but PE and BeOS based systems for example check a list of possible entry symbols, matching the first one found.
  • the address of the first byte of the `.text' section, if present;
  • The address 0.

See also: is there a GCC compiler/linker option to change the name of main?

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985