How to make the dwarf sections get loaded into memory in an elf file?

Question

I am writing a C program without the standard library, which is loaded into memory by an elf loader and then executes. I wish for this C program to also be able to have its dwarf debugging sections loaded into memory so it can print a backtrace at runtime.

To try to achieve this, I've placed in my C program:

extern char __my_old_debug_abbrev_start[];
extern char __my_old_debug_abbrev_end[];
extern char __my_old_debug_info_start[];
extern char __my_old_debug_info_end[];
extern char __my_old_debug_str_start[];
extern char __my_old_debug_str_end[];

So it can figure out where the sections are. Then to actually provide the locations, I have a linker script which looks like:

SECTIONS
{
  .debug_abbrev : {
    __my_old_debug_abbrev_start = .;
    KEEP (*(.debug_abbrev)) *(.debug_abbrev)
    __my_old_debug_abbrev_end = .;
  }
  .debug_info : {
    __my_old_debug_info_start = .;
    KEEP (*(.debug_info .gnu.linkonce.wi.*)) *(.debug_info .gnu.linkonce.wi.*)
    __my_old_debug_info_end = .;
  }
  .debug_str : {
    __my_old_debug_str_start = .;
    KEEP (*(.debug_str)) *(.debug_str)
    __my_old_debug_str_end = .;
  }
}
INSERT AFTER .rodata;

First, I compile the C program into a libtest.a and then use objcopy to set the sections to be alloc and load.

objcopy --set-section-flags '.debug_abbrev=alloc,load' libtest.a
objcopy --set-section-flags '.debug_info=alloc,load' libtest.a
objcopy --set-section-flags '.debug_str=alloc,load' libtest.a
objcopy --set-section-flags '.gnu.linkonce.wi.*=alloc,load' libtest.a

Then, I run gcc on the archive to compile it into an executable, along the lines of:

gcc libtest.a -o test -T test.lds -static

This produces errors:

/usr/bin/x86_64-linux-gnu-ld: section .debug_info LMA [0000000000000000,0000000000066291] overlaps section .debug_abbrev LMA [0000000000000000,0000000000007cce]
/usr/bin/x86_64-linux-gnu-ld: section .debug_str LMA [0000000000000000,000000000009d264] overlaps section .debug_info LMA [0000000000000000,0000000000066291]

I'm not sure how to fix this as the sections only really exist after linking(?) and then maybe I can adjust the lma with objcopy(?) but then I'm not sure where I would place them.

I have seen https://stackoverflow.com/a/31126336/3492895 but I am not sure how I would create the "hole" before linking so I can use objcopy to adjust things.

try to merge the three output sections into one, call it something like `.debug_all`, then post the result — izac89, Sep 17 '18 at 19:37

Winestone · Answer 1 · 2018-09-18T17:33:25.163

Using user2162550's suggestion, the code managed to compile but some code I had to print out the function names which were in the debugging information were printing out nothing. I then saw a comment in the default linker script gcc uses (by passing -Wl,--verbose to it when linking the executable):

/* DWARF debug sections.
  Symbols in the DWARF debugging sections are relative to the beginning
  of the section so we begin them at 0.  */
...
.debug_info     0 : { *(.debug_info .gnu.linkonce.wi.*) }
.debug_abbrev   0 : { *(.debug_abbrev) }
...

This convinced me that it didn't matter where the debugging symbols were, in the final binary. So then I tried to use the hole trick (from here) but I wasn't sure how to copy the debugging info before the executable was linked (once the executable is linked, I don't think objcopy works anymore). So I decided to leave some space which is loaded and allocated in the binary and then, after linking, copy the required sections into that space.

To do this, I used the linker script to leave a hole and to also provide symbols to figure out where the debugging sections are. The method I got working was to use the linker script to first measure the size of each debugging section and then allocate enough space for it. This looks like (in test.lds:

/* This finds the start and end of each section so we know its size */
SECTIONS
{
  .debug_info 0 : {
    __my_old_debug_info_start = .;
    KEEP (*(.debug_info .gnu.linkonce.wi.*)) *(.debug_info .gnu.linkonce.wi.*)
    __my_old_debug_info_end = .;
  }
  .debug_abbrev 0 : {
    __my_old_debug_abbrev_start = .;
    KEEP (*(.debug_abbrev)) *(.debug_abbrev)
    __my_old_debug_abbrev_end = .;
  }
  .debug_str 0 : {
    __my_old_debug_str_start = .;
    KEEP (*(.debug_str)) *(.debug_str)
    __my_old_debug_str_end = .;
  }
}
INSERT AFTER .rodata;

/* This creates some space in the binary which is loaded and big enough to store all the debugging info, as well as marking the start and end of each area */
SECTIONS
{
  .debug_all : {
    __my_debug_info_start = .;
    . += __my_old_debug_info_end - __my_old_debug_info_start;
    __my_debug_info_end = .;
    __my_debug_abbrev_start = .;
    . += __my_old_debug_abbrev_end - __my_old_debug_abbrev_start;
    __my_debug_abbrev_end = .;
    __my_debug_str_start = .;
    . += __my_old_debug_str_end - __my_old_debug_str_start;
    __my_debug_str_end = .;
  }
}
INSERT AFTER .rodata;

I think the choice of .rodata for INSERT AFTER is arbitrary.

Then, I compiled and linked with:

gcc libtest.a -g -o test -T test.lds -static

Taking inspiration from this, I had a bash script parse the output of readelf and compute where in the binary to get the debugging information from and where to copy it to so it would get loaded. The copying is done using dd.

function getSymbolValue {
  binary=$1
  symbol=$2

  # Assumes that this will only find one symbol
  truncated_symbol=`echo $symbol | cut -c 1-25`
  readelf -s $binary | grep $truncated_symbol | awk '{print $2}'
}
function getSectionInfo {
  binary=$1
  section=$2

  # returns all but the [Nr] column of data returned by readelf
  # https://stackoverflow.com/a/3795522/3492895
  readelf -S $binary | cut -c7- | grep '\.'"$section"
}
function getSectionAddress {
  binary=$1
  section=$2

  getSectionInfo $binary $section | awk '{print $3}'
}
function getSectionOffset {
  binary=$1
  section=$2

  getSectionInfo $binary $section | awk '{print $4}'
}
function copyData {
  binary=$1
  from_start=$2
  to_start=$3
  len=$4

  dd iflag=skip_bytes,count_bytes if=$binary skip=$from_start count=$len | dd oflag=seek_bytes of=$binary seek=$to_start count=$len conv=notrunc
}
function copyDebugSection {
  binary=$1
  from_section=$2
  to_section=$3

  from_off=`getSectionOffset $binary $from_section`
  to_section_off=`getSectionOffset $binary $to_section`
  to_section_addr=`getSectionAddress $binary $to_section`
  to_start_addr=`getSymbolValue $binary "__my_${from_section}_start"`
  to_end_addr=`getSymbolValue $binary "__my_${from_section}_end"`

  copyData $binary $((0x$from_off)) $((0x$to_start_addr - 0x$to_section_addr + 0x$to_section_off)) $((0x$to_end_addr - 0x$to_start_addr))
}

copyDebugSection ./test 'debug_info' 'debug_all'
copyDebugSection ./test 'debug_abbrev' 'debug_all'
copyDebugSection ./test 'debug_str' 'debug_all'

After running this, the functions names I were expecting, were printed out.

If anyone was wondering how I printed out the function names, I wrote some code in rust using the library gimli. Since this was irrelevant to the question I didn't include it. I used this to ensure that the correct debugging information was there, since I didn't find any magic dwarf numbers to look for online to ensure integrity of the information.

The only potential problem is that when running readelf, it outputs:

  [Nr] Name              Type             Address           Offset
   Size              EntSize          Flags  Link  Info  Align
...
readelf: Warning: [ 3]: Link field (0) should index a symtab section.
  [ 3] .rela.plt         RELA             0000000000400168  00000168
   0000000000000228  0000000000000018  AI       0    25     8

But I do not understand what this means and it does not seem to pose a problem.

Please tell me if there is anything I can do to improve this question or answer.

You could be able to get the start and end of your debug sections by declaring (and taking the addresses of) something like `extern void __start_debug_info, __stop_debug_info;` to avoid the symbol placement shenanigans. See this other question: https://stackoverflow.com/questions/16552710/how-do-you-get-the-start-and-end-addresses-of-a-custom-elf-section-in-c-gcc — zneak, Sep 21 '18 at 03:16
@zneak I tried that, replacing `. += __my_old_debug_info_end - __my_old_debug_info_start;` with `. += __stop_debug_info - __start_debug_info;` and etc. in the linker script and removing the first `SECTIONS` block . The linker spit out: ` undefined symbol `__stop_debug_info' referenced in expression`, so it didn't seem to have worked. D= — Winestone, Sep 27 '18 at 16:01
What if you use them from your C code instead of integrating them to the linker script? Is the problem then that your sections aren’t copied to the executable? — zneak, Sep 27 '18 at 16:13
I think the problem would then become figuring out how big the custom `.debug_all` section should be to contain a copy of the debug sections. I tried using `SIZEOF` but that doesn't seem to work. It seems to return 0 if `.debug_all` is `INSERT AFTER .rodata` but if we insert it later, after the last debug section, it returns a non-zero value. But then `readelf` reports `Warning: DIE at offset 0xb refers to abbreviation number 1 which does not exist` and the debug sections do not seem to parse properly. — Winestone, Sep 28 '18 at 08:56

How to make the dwarf sections get loaded into memory in an elf file?

1 Answers1