-1

Essentially, I am doing something similar to https://wiki.osdev.org/ELF_Tutorial, where I load the data into structs and read the various sections by their offsets. The host is little endian and I'm trying to analyze files that were cross-compiled for a big endian target. I tried doing the same code sequence with these big endian files as with the little endian files, but the code segfaults when trying to access the sections.

int fd = open(filename, O_RDONLY);
char *header_start = (char *)mmap(0, file_size, PROT_READ, MAP_PRIVATE, fd, 0);
Elf32_Ehdr* elf_ehdr = (Elf32_Ehdr *)header_start;
Elf32_Shdr* elf_shdrs = (Elf32_Shdr *)((int)header_start + elf_ehdr->e_shoff);
Elf32_Shdr* sh_strtab = &elf_shdrs[elf_ehdr->e_shstrndx];
// code segfaults here when trying to access sh_strtab->sh_offset for big endian
// files, but works just fine for little endian files

Why does the code fail for big endian files?

srujzs
  • 340
  • 3
  • 14
  • Are you trying to *analyze* binaries cross-compiled for a different target? If that segfaults, you're doing it wrong, but we'd need to see your reading code. A [mcve], so to speak... Or are you trying to *run* cross-compiled binaries for a different target? Those, *of course*, wouldn't run... – DevSolar Mar 22 '18 at 23:39
  • This is probably a good example of a question that would benefit from a [MCVE]. A general "here's how to parse an ELF file in an endian-agnostic manner" answer is tutorial-length, whereas what you want to know is why your attempt at implementing such a program is crashing. – Lightness Races in Orbit Mar 22 '18 at 23:41
  • Added code to replicate the issue. I'm trying to analyze the files, and definitely not trying to run them. – srujzs Mar 22 '18 at 23:48
  • `elf_ehdr->e_shoff` is presumably an integer that will be stored big endian in a big endian file. If your CPU is little endian, you're going to need to flip the endian before you use it in any math. – user4581301 Mar 22 '18 at 23:57
  • I think that (converting the integers to little endian) did the trick, thanks! I've edited the question to be more focused. Please feel free to add your comment as an answer, or if you would like, I can go ahead and reiterate what you have said. – srujzs Mar 23 '18 at 00:50

1 Answers1

2

In a big endian file elf_ehdr->e_shoff is going to be a big endian integer, and the big endian byte order needs to be respected.

Say we're dealing in 32 bits and e_shoff is a nice small number like 64. In big endian it going to be recorded in the file as 0x00000040. But you're reading this file on what appears to be a little endian CPU, so that 0x00000040 is read out of the file as a binary blob and that will be interpreted by the CPU as 1073741824.

Elf32_Shdr* elf_shdrs = (Elf32_Shdr *)((int)header_start + elf_ehdr->e_shoff);

resolves to

Elf32_Shdr* elf_shdrs = (Elf32_Shdr *)((int)header_start + 1073741824);

not

Elf32_Shdr* elf_shdrs = (Elf32_Shdr *)((int)header_start + 64);

and is going to miss the target by a wide margin. Trying to access members of the resulting elf_shdrs wanders into undefined behaviour.

Quick hack fix is

Elf32_Shdr* elf_shdrs = (Elf32_Shdr *)(header_start + ResolveEndian(elf_ehdr->e_shoff));

where ResolveEndian is a series of overloaded functions that either do absolutely nothing because the file endian matches the system endian or flips the byte order. For many examples of how to do this, see How do I convert between big-endian and little-endian values in C++?

The longer fix would not use memory mapped files and would instead deserialize the file taking into account the differences in variable sizes (and the resulting differences in offsets) between 32 and 64 bit programs as well as endian. This will result in a more robust and portable parser that will always work regardless of the source ELF and the compiler implementation used to build the parser.

user4581301
  • 33,082
  • 7
  • 33
  • 54