18

I need to get the start and end address of an executable's text section. How can I get it?

I can get the starting address from the _init symbol or the _start symbol, but what about the ending address? Shall I consider the ending address of the text section to be the last address before starting of the .rodata section?

Or shall I edit the default ld script and add my own symbols to indicate the start and end of the text section, and pass it to GCC when compiling? In this case, where shall I place the new symbols, shall I consider the init and fini section?

What is a good way to get the start and end address of the text section?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
phoxis
  • 60,131
  • 14
  • 81
  • 117

4 Answers4

29

The GNU binutils default linker scripts for ELF-based platforms normally define quite a number of different symbols which can be used to find the start and end of various sections.

The end of the text section is usually referenced by a choice of three different symbols: etext, _etext or __etext; the start can be found as __executable_start. (Note that these symbols are usually exported using the PROVIDE() mechanism, which means that they will be overridden if something else in your executable defines them rather than merely referencing them. In particular that means that _etext or __etext are likely to be safer choices than etext.)

Example:

$ cat etext.c
#include <stdio.h>

extern char __executable_start;
extern char __etext;

int main(void)
{
  printf("0x%lx\n", (unsigned long)&__executable_start);
  printf("0x%lx\n", (unsigned long)&__etext);
  return 0;
}
$ gcc -Wall -o etext etext.c
$ ./etext
0x8048000
0x80484a0
$

I don't believe that any of these symbols are specified by any standard, so this shouldn't be assumed to be portable (I have no idea whether even GNU binutils provides them for all ELF-based platforms, or whether the set of symbols provided has changed over different binutils versions), although I guess if a) you are doing something that needs this information, and b) you're considering hacked linker scripts as an option, then portability isn't too much of a concern!

To see the exact set of symbols you get when building a particular thing on a particular platform, give the --verbose flag to ld (or -Wl,--verbose to gcc) to print the linker script it chooses to use (there are really several different default linker scripts, which vary according to linker options and the type of object you're building).

Matthew Slattery
  • 45,290
  • 8
  • 103
  • 119
  • which would be a better option then? hack the linker script and insert my own symbols ? – phoxis Sep 11 '11 at 05:55
  • 2
    No, if these symbols work on your platform, you might as well use them. The above example code works on at least Linux x86, Linux ppc and NetBSD x86 - I just don't know whether there are other platforms it won't work on. (A hacked linker script is *less* portable: a hacked Linux x86 linker script almost certainly won't work on Linux ppc, for example.) – Matthew Slattery Sep 11 '11 at 14:01
  • `extern const char __executable_start[];` would avoid having to use `&` to take the address, making it work syntactically more like a NASM label/symbol. And would avoid having an actual object you could assign to, like how labels in asm are zero-width and can be at the end of things. – Peter Cordes Jul 19 '22 at 20:14
  • why does this result differ from the text section in size's output – SolskGaer Apr 18 '23 at 08:55
8

It's incorrect to speak of "the" text segment, since there may be more than one (guaranteed for the usual case when you have shared libraries, but it's still possible for a single ELF binary to have multiple PT_LOAD sections with the same flags anyway).

The following sample program dumps out all the information returned by dl_iterate_phr. You're interested in any segment of type PT_LOAD with the PF_X flag (note that PT_GNU_STACK will include the flag if -z execstack is passed to the linker, so you really do have to check both).

#define _GNU_SOURCE
#include <link.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>

const char *type_str(ElfW(Word) type)
{
    switch (type)
    {
    case PT_NULL:
        return "PT_NULL"; // should not be seen at runtime, only in the file!
    case PT_LOAD:
        return "PT_LOAD";
    case PT_DYNAMIC:
        return "PT_DYNAMIC";
    case PT_INTERP:
        return "PT_INTERP";
    case PT_NOTE:
        return "PT_NOTE";
    case PT_SHLIB:
        return "PT_SHLIB";
    case PT_PHDR:
        return "PT_PHDR";
    case PT_TLS:
        return "PT_TLS";
    case PT_GNU_EH_FRAME:
        return "PT_GNU_EH_FRAME";
    case PT_GNU_STACK:
        return "PT_GNU_STACK";
    case PT_GNU_RELRO:
        return "PT_GNU_RELRO";
    case PT_SUNWBSS:
        return "PT_SUNWBSS";
    case PT_SUNWSTACK:
        return "PT_SUNWSTACK";
    default:
        if (PT_LOOS <= type && type <= PT_HIOS)
        {
            return "Unknown OS-specific";
        }
        if (PT_LOPROC <= type && type <= PT_HIPROC)
        {
            return "Unknown processor-specific";
        }
        return "Unknown";
    }
}

const char *flags_str(ElfW(Word) flags)
{
    switch (flags & (PF_R | PF_W | PF_X))
    {
    case 0 | 0 | 0:
        return "none";
    case 0 | 0 | PF_X:
        return "x";
    case 0 | PF_W | 0:
        return "w";
    case 0 | PF_W | PF_X:
        return "wx";
    case PF_R | 0 | 0:
        return "r";
    case PF_R | 0 | PF_X:
        return "rx";
    case PF_R | PF_W | 0:
        return "rw";
    case PF_R | PF_W | PF_X:
        return "rwx";
    }
    __builtin_unreachable();
}

static int callback(struct dl_phdr_info *info, size_t size, void *data)
{
    int j;
    (void)data;

    printf("object \"%s\"\n", info->dlpi_name);
    printf("  base address: %p\n", (void *)info->dlpi_addr);
    if (size > offsetof(struct dl_phdr_info, dlpi_adds))
    {
        printf("  adds: %lld\n", info->dlpi_adds);
    }
    if (size > offsetof(struct dl_phdr_info, dlpi_subs))
    {
        printf("  subs: %lld\n", info->dlpi_subs);
    }
    if (size > offsetof(struct dl_phdr_info, dlpi_tls_modid))
    {
        printf("  tls modid: %zu\n", info->dlpi_tls_modid);
    }
    if (size > offsetof(struct dl_phdr_info, dlpi_tls_data))
    {
        printf("  tls data: %p\n", info->dlpi_tls_data);
    }
    printf("  segments: %d\n", info->dlpi_phnum);

    for (j = 0; j < info->dlpi_phnum; j++)
    {
        const ElfW(Phdr) *hdr = &info->dlpi_phdr[j];
        printf("    segment %2d\n", j);
        printf("      type: 0x%08X (%s)\n", hdr->p_type, type_str(hdr->p_type));
        printf("      file offset: 0x%08zX\n", hdr->p_offset);
        printf("      virtual addr: %p\n", (void *)hdr->p_vaddr);
        printf("      physical addr: %p\n", (void *)hdr->p_paddr);
        printf("      file size: 0x%08zX\n", hdr->p_filesz);
        printf("      memory size: 0x%08zX\n", hdr->p_memsz);
        printf("      flags: 0x%08X (%s)\n", hdr->p_flags, flags_str(hdr->p_flags));
        printf("      align: %zd\n", hdr->p_align);
        if (hdr->p_memsz)
        {
            printf("      derived address range: %p to %p\n",
                (void *) (info->dlpi_addr + hdr->p_vaddr),
                (void *) (info->dlpi_addr + hdr->p_vaddr + hdr->p_memsz));
        }
    }
    return 0;
}

int main(void)
{
    dl_iterate_phdr(callback, NULL);

    exit(EXIT_SUCCESS);
}
o11c
  • 15,265
  • 4
  • 50
  • 75
2

For Linux, consider using nm(1) tool to inspect what symbols the object file provides. You can pick through this set of symbols, where you could learn both of the symbols that Matthew Slattery provided in his answer.

sholsapp
  • 15,542
  • 10
  • 50
  • 67
1

.rodata is not guaranteed to always come directly after .text. You can use objdump -h file and readelf --sections file to get more info. With objdump you get both size and offset into file.

Emil Romanus
  • 794
  • 1
  • 4
  • 8