I just encountered this problem, and unfortunately, I cannot produce a reproducible example. I'll try to provide some debug data that I can see, and I hope I can get some help, primarily with the following:
- Is my conclusion that
read
is failing sound? - If so, given the debug data, under what circumstances, would
read
fail?
The context is: I am building and ARM 32-bit binary as .elf file under MINGW64 on Windows. I made also a user-space "inspector" .c program, that I compile for Windows with the MINGW64 gcc compiler, helping me list the sections in the .elf file and retrieving some data from there - most of the heavy lifting there is done by the https://github.com/TheCodeArtist/elf-parser/ library.
So, I've been happily compiling new code in my .elf, and inspecting it with my elf-parser
program, no problem for days.
Suddenly, I made a change in the .elf code, where I simply grouped some global variables into a struct - from that point on, the elf-parser
program started failing - reporting null addresses.
I've made builds of the .elf before (pre_project.elf
) and after (post_project.elf
) this code change, and confirmed that the inspector user-space program - or rather, elf-parser
library - fails on pre_project.elf
, but works fine on post_project.elf
; both of these files contain 33 .elf sections.
Looking deeper, I found that the original point of failure is the read_section_header_table
in elf-parser.c; and I've added the following printout there:
void read_section_header_table(int32_t fd, Elf32_Ehdr eh, Elf32_Shdr sh_table[])
{
uint32_t i;
assert(lseek(fd, (off_t)eh.e_shoff, SEEK_SET) == (off_t)eh.e_shoff);
for(i=0; i<eh.e_shnum; i++) {
assert(read(fd, (void *)&sh_table[i], eh.e_shentsize)
== eh.e_shentsize);
printf(" i %d {sh_name = %d, sh_type = %d, sh_flags = %d, sh_addr = %d, sh_offset = %d, sh_size = %d, sh_link = %d, sh_info = %d, sh_addralign = %d, sh_entsize = %d}\r\n",
i, sh_table[i].sh_name, sh_table[i].sh_type, sh_table[i].sh_flags, sh_table[i].sh_addr, sh_table[i].sh_offset, sh_table[i].sh_size, sh_table[i].sh_link, sh_table[i].sh_info, sh_table[i].sh_addralign, sh_table[i].sh_entsize
);
}
}
The function that I've used in my code, that eventually calls this function, is basically taken from main()
in elf-parser-main.c; what they do there before calling this function is:
sh_tbl = malloc(eh.e_shentsize * eh.e_shnum);
if(!sh_tbl) {
printf("Failed to allocate %d bytes\n",
(eh.e_shentsize * eh.e_shnum));
}
read_section_header_table(fd, eh, sh_tbl);
And I've checked that the eh.e_shentsize * eh.e_shnum
is correct in both cases (section header size for 32-bit ELF is 40 bytes, and these files have 33 sections, so 1320 bytes), and malloc allocation does not trigger error - so that part should be fine.
Now, first, I've confirmed with readelf
, that indeed both *.elf files are parseable by the usual tools:
$ arm-none-eabi-readelf -WS pre_project.elf | grep '^There\|^ \[[23]'
There are 33 section headers, starting at offset 0xff440:
[20] .debug_info PROGBITS 00000000 0219fa 04b346 00 0 0 1
[21] .debug_abbrev PROGBITS 00000000 06cd40 00c04c 00 0 0 1
[22] .debug_loc PROGBITS 00000000 078d8c 029b72 00 0 0 1
[23] .debug_aranges PROGBITS 00000000 0a2900 002208 00 0 0 8
[24] .debug_ranges PROGBITS 00000000 0a4b08 007570 00 0 0 8
[25] .debug_line PROGBITS 00000000 0ac078 0306da 00 0 0 1
[26] .debug_str PROGBITS 00000000 0dc752 00c469 01 MS 0 0 1
[27] .debug_frame PROGBITS 00000000 0e8bbc 005ed0 00 0 0 4
[28] .stab PROGBITS 00000000 0eea8c 00006c 0c 29 0 4
[29] .stabstr STRTAB 00000000 0eeaf8 0000e3 00 0 0 1
[30] .symtab SYMTAB 00000000 0eebdc 00b7f0 10 31 2229 4
[31] .strtab STRTAB 00000000 0fa3cc 004ee9 00 0 0 1
[32] .shstrtab STRTAB 00000000 0ff2b5 00018a 00 0 0 1
$ arm-none-eabi-readelf -WS build/post_project.elf | grep '^There\|^ \[[23]'
There are 33 section headers, starting at offset 0xff4e8:
[20] .debug_info PROGBITS 00000000 0219fa 04b3fc 00 0 0 1
[21] .debug_abbrev PROGBITS 00000000 06cdf6 00c05d 00 0 0 1
[22] .debug_loc PROGBITS 00000000 078e53 029b72 00 0 0 1
[23] .debug_aranges PROGBITS 00000000 0a29c8 002208 00 0 0 8
[24] .debug_ranges PROGBITS 00000000 0a4bd0 007570 00 0 0 8
[25] .debug_line PROGBITS 00000000 0ac140 0306da 00 0 0 1
[26] .debug_str PROGBITS 00000000 0dc81a 00c44a 01 MS 0 0 1
[27] .debug_frame PROGBITS 00000000 0e8c64 005ed0 00 0 0 4
[28] .stab PROGBITS 00000000 0eeb34 00006c 0c 29 0 4
[29] .stabstr STRTAB 00000000 0eeba0 0000e3 00 0 0 1
[30] .symtab SYMTAB 00000000 0eec84 00b7f0 10 31 2229 4
[31] .strtab STRTAB 00000000 0fa474 004ee9 00 0 0 1
[32] .shstrtab STRTAB 00000000 0ff35d 00018a 00 0 0 1
So, all looks good there.
Anyway, running "inspector.exe --elf-file pre_project.elf` results with this printout near the end of the loop:
$ inspector.exe --elf-file pre_project.elf
...
i 24 {sh_name = 329, sh_type = 1, sh_flags = 0, sh_addr = 0, sh_offset = 674568, sh_size = 30064, sh_link = 0, sh_info = 0, sh_addralign = 8, sh_entsize = 0}
i 25 {sh_name = 343, sh_type = 1, sh_flags = 0, sh_addr = 0, sh_offset = 704632, sh_size = 198362, sh_link = 0, sh_info = 0, sh_addralign = 1, sh_entsize = 0}
i 26 {sh_name = 355, sh_type = 1, sh_flags = 48, sh_addr = 0, sh_offset = 902994, sh_size = 50281, sh_link = 0, sh_info = 0, sh_addralign = 1, sh_entsize = 1}
i 27 {sh_name = 366, sh_type = 1, sh_flags = 0, sh_addr = 0, sh_offset = 953276, sh_size = 24272, sh_link = 0, sh_info = 0, sh_addralign = 4, sh_entsize = 0}
i 28 {sh_name = 379, sh_type = 1, sh_flags = 0, sh_addr = 0, sh_offset = 977548, sh_size = 108, sh_link = 29, sh_info = 0, sh_addralign = 4, sh_entsize = 12}
i 29 {sh_name = 385, sh_type = 3, sh_flags = 0, sh_addr = 0, sh_offset = 977656, sh_size = 227, sh_link = 0, sh_info = 0, sh_addralign = 1, sh_entsize = 0}
i 30 {sh_name = 1, sh_type = 2, sh_flags = 0, sh_addr = 0, sh_offset = 977884, sh_size = 47088, sh_link = 31, sh_info = 2229, sh_addralign = 4, sh_entsize = 16}
i 31 {sh_name = 9, sh_type = 3, sh_flags = 0, sh_addr = 0, sh_offset = 1024972, sh_size = 20201, sh_link = 0, sh_info = 0, sh_addralign = 1, sh_entsize = 0}
i 32 {sh_name = 17, sh_type = 3, sh_flags = 0, sh_addr = 0, sh_offset = 1045173, sh_size = 394, sh_link = 0, sh_info = 0, sh_addralign = 1, sh_entsize = 0}
...
... and all looks good -- however, running the program on the post_project.elf
file results with:
$ inspector.exe --elf-file post_project.elf
...
i 24 {sh_name = 329, sh_type = 1, sh_flags = 0, sh_addr = 0, sh_offset = 674768, sh_size = 30064, sh_link = 0, sh_info = 0, sh_addralign = 8, sh_entsize = 0}
i 25 {sh_name = 343, sh_type = 1, sh_flags = 0, sh_addr = 0, sh_offset = 704832, sh_size = 198362, sh_link = 0, sh_info = 0, sh_addralign = 1, sh_entsize = 0}
i 26 {sh_name = 355, sh_type = 1, sh_flags = 48, sh_addr = 0, sh_offset = 903194, sh_size = 50250, sh_link = 0, sh_info = 0, sh_addralign = 1, sh_entsize = 1}
i 27 {sh_name = 0, sh_type = 0, sh_flags = 0, sh_addr = 0, sh_offset = 0, sh_size = 0, sh_link = 0, sh_info = 0, sh_addralign = 0, sh_entsize = 0}
i 28 {sh_name = 0, sh_type = 0, sh_flags = 0, sh_addr = 0, sh_offset = 0, sh_size = 0, sh_link = 0, sh_info = 0, sh_addralign = 0, sh_entsize = 0}
i 29 {sh_name = 0, sh_type = 0, sh_flags = 0, sh_addr = 0, sh_offset = 0, sh_size = 0, sh_link = 0, sh_info = 0, sh_addralign = 0, sh_entsize = 0}
i 30 {sh_name = 0, sh_type = 0, sh_flags = 0, sh_addr = 0, sh_offset = 0, sh_size = 0, sh_link = 0, sh_info = 0, sh_addralign = 0, sh_entsize = 0}
i 31 {sh_name = 0, sh_type = 0, sh_flags = 0, sh_addr = 0, sh_offset = 0, sh_size = 0, sh_link = 0, sh_info = 0, sh_addralign = 0, sh_entsize = 0}
i 32 {sh_name = 0, sh_type = 0, sh_flags = 0, sh_addr = 0, sh_offset = 0, sh_size = 0, sh_link = 0, sh_info = 0, sh_addralign = 0, sh_entsize = 0}
...
... and later on, these null offset addresses cause segfaults/corruption.
So, once the read_section_header_table
function hits section index i==27 (seemingly .debug_frame
), the read(fd, (void *)&sh_table[i], eh.e_shentsize)
basically reads all zeroes (and writes them) into the sh_table[i]
structure(s) - and mind you, this does not trip the assert that wraps it, so the system considers proper 40 bytes to have been read in these calls as well!
And note also, that the reads for post_project.elf
before index 27 actually look quite reasonable (say for i==26, sh_offset = 903194 = 0xdc81a, the same offset reported by objdump
for the same file) ?!
The only way I can describe this so far, is basically read
failing in the middle of reading a file ?!?!
I've never experienced anything like this - so I'm really wondering under what possible conditions would read
here fail, considering that:
- If
elf-parser
library was all that wrong in pointer arithmetic, it should have failed also onpre_project.elf
, which it didn't (and in fact, ran fine for days). - If the
post_project.file
itself was corrupt as an ELF file - thenreadelf
should not have been able to process it either, and it does - Maybe the
post_project.file
was on disk with a corrupt sector - but I tried copying both it and the executable at several different paths in my system, they all result with read failure (and a corrup sector would have trippedreadelf
too)
The only thing I can see as a possible reason here, is that - considering that read
is a syscall, in principle it "asks" the OS, here Windows, for data - maybe Windows somehow flagged post_project.elf
as a virus or something, then we start reading, then once Windows realizes something is reading this file, it stops delivering data?! But shouldn't have that resulted with a read failure at least? (and plus why flag an .elf file as a virus - it's not even a Windows executable?!)