Using user2162550's suggestion, the code managed to compile but some code I had to print out the function names which were in the debugging information were printing out nothing. I then saw a comment in the default linker script gcc uses (by passing -Wl,--verbose
to it when linking the executable):
/* DWARF debug sections.
Symbols in the DWARF debugging sections are relative to the beginning
of the section so we begin them at 0. */
...
.debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) }
.debug_abbrev 0 : { *(.debug_abbrev) }
...
This convinced me that it didn't matter where the debugging symbols were, in the final binary. So then I tried to use the hole trick (from here) but I wasn't sure how to copy the debugging info before the executable was linked (once the executable is linked, I don't think objcopy
works anymore). So I decided to leave some space which is loaded and allocated in the binary and then, after linking, copy the required sections into that space.
To do this, I used the linker script to leave a hole and to also provide symbols to figure out where the debugging sections are. The method I got working was to use the linker script to first measure the size of each debugging section and then allocate enough space for it. This looks like (in test.lds
:
/* This finds the start and end of each section so we know its size */
SECTIONS
{
.debug_info 0 : {
__my_old_debug_info_start = .;
KEEP (*(.debug_info .gnu.linkonce.wi.*)) *(.debug_info .gnu.linkonce.wi.*)
__my_old_debug_info_end = .;
}
.debug_abbrev 0 : {
__my_old_debug_abbrev_start = .;
KEEP (*(.debug_abbrev)) *(.debug_abbrev)
__my_old_debug_abbrev_end = .;
}
.debug_str 0 : {
__my_old_debug_str_start = .;
KEEP (*(.debug_str)) *(.debug_str)
__my_old_debug_str_end = .;
}
}
INSERT AFTER .rodata;
/* This creates some space in the binary which is loaded and big enough to store all the debugging info, as well as marking the start and end of each area */
SECTIONS
{
.debug_all : {
__my_debug_info_start = .;
. += __my_old_debug_info_end - __my_old_debug_info_start;
__my_debug_info_end = .;
__my_debug_abbrev_start = .;
. += __my_old_debug_abbrev_end - __my_old_debug_abbrev_start;
__my_debug_abbrev_end = .;
__my_debug_str_start = .;
. += __my_old_debug_str_end - __my_old_debug_str_start;
__my_debug_str_end = .;
}
}
INSERT AFTER .rodata;
I think the choice of .rodata
for INSERT AFTER
is arbitrary.
Then, I compiled and linked with:
gcc libtest.a -g -o test -T test.lds -static
Taking inspiration from this, I had a bash script parse the output of readelf
and compute where in the binary to get the debugging information from and where to copy it to so it would get loaded. The copying is done using dd
.
function getSymbolValue {
binary=$1
symbol=$2
# Assumes that this will only find one symbol
truncated_symbol=`echo $symbol | cut -c 1-25`
readelf -s $binary | grep $truncated_symbol | awk '{print $2}'
}
function getSectionInfo {
binary=$1
section=$2
# returns all but the [Nr] column of data returned by readelf
# https://stackoverflow.com/a/3795522/3492895
readelf -S $binary | cut -c7- | grep '\.'"$section"
}
function getSectionAddress {
binary=$1
section=$2
getSectionInfo $binary $section | awk '{print $3}'
}
function getSectionOffset {
binary=$1
section=$2
getSectionInfo $binary $section | awk '{print $4}'
}
function copyData {
binary=$1
from_start=$2
to_start=$3
len=$4
dd iflag=skip_bytes,count_bytes if=$binary skip=$from_start count=$len | dd oflag=seek_bytes of=$binary seek=$to_start count=$len conv=notrunc
}
function copyDebugSection {
binary=$1
from_section=$2
to_section=$3
from_off=`getSectionOffset $binary $from_section`
to_section_off=`getSectionOffset $binary $to_section`
to_section_addr=`getSectionAddress $binary $to_section`
to_start_addr=`getSymbolValue $binary "__my_${from_section}_start"`
to_end_addr=`getSymbolValue $binary "__my_${from_section}_end"`
copyData $binary $((0x$from_off)) $((0x$to_start_addr - 0x$to_section_addr + 0x$to_section_off)) $((0x$to_end_addr - 0x$to_start_addr))
}
copyDebugSection ./test 'debug_info' 'debug_all'
copyDebugSection ./test 'debug_abbrev' 'debug_all'
copyDebugSection ./test 'debug_str' 'debug_all'
After running this, the functions names I were expecting, were printed out.
If anyone was wondering how I printed out the function names, I wrote some code in rust using the library gimli. Since this was irrelevant to the question I didn't include it. I used this to ensure that the correct debugging information was there, since I didn't find any magic dwarf numbers to look for online to ensure integrity of the information.
The only potential problem is that when running readelf
, it outputs:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
...
readelf: Warning: [ 3]: Link field (0) should index a symtab section.
[ 3] .rela.plt RELA 0000000000400168 00000168
0000000000000228 0000000000000018 AI 0 25 8
But I do not understand what this means and it does not seem to pose a problem.
Please tell me if there is anything I can do to improve this question or answer.