1

I am trying to wrap my head around the workings of the ELF file format and the way objects get linked and executed.

In order to learn more I tried to analyze the output of the following assembler code:

#----------------------------------------------------------------------------------------
# Exits immediately using code 3. Runs on 64-bit Linux 
#     gcc -c basic.s && ld basic.o && ./a.out
#----------------------------------------------------------------------------------------

        .global _start

        .text
_start:
        # _exit(3)
        mov $60, %rax       # system call 60 is exit
        mov $3,  %rdi       # we want return code 3
        syscall             # invoke operating system to exit

I am using readelf and hd to analyze the output a.out.

Here are the relevant outputs:

readelf -l a.out:

Elf file type is EXEC (Executable file)
Entry point 0x401000
There are 2 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000000b0 0x00000000000000b0  R      0x1000
  LOAD           0x0000000000001000 0x0000000000401000 0x0000000000401000
                 0x0000000000000010 0x0000000000000010  R E    0x1000

 Section to Segment mapping:
  Segment Sections...
   00     
   01     .text

Excerpts of hd a.out:

00000000  7f 45 4c 46 02 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  02 00 3e 00 01 00 00 00  00 10 40 00 00 00 00 00  |..>.......@.....|
00000020  40 00 00 00 00 00 00 00  e0 10 00 00 00 00 00 00  |@...............|
00000030  00 00 00 00 40 00 38 00  02 00 40 00 05 00 04 00  |....@.8...@.....|
00000040  01 00 00 00 04 00 00 00  00 00 00 00 00 00 00 00  |................|
00000050  00 00 40 00 00 00 00 00  00 00 40 00 00 00 00 00  |..@.......@.....|
00000060  b0 00 00 00 00 00 00 00  b0 00 00 00 00 00 00 00  |................|
00000070  00 10 00 00 00 00 00 00  01 00 00 00 05 00 00 00  |................|
00000080  00 10 00 00 00 00 00 00  00 10 40 00 00 00 00 00  |..........@.....|
00000090  00 10 40 00 00 00 00 00  10 00 00 00 00 00 00 00  |..@.............|
000000a0  10 00 00 00 00 00 00 00  00 10 00 00 00 00 00 00  |................|
000000b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
[...]

and

[...]
00001000  48 c7 c0 3c 00 00 00 48  c7 c7 03 00 00 00 0f 05  |H..<...H........|
[...]

As you can see from this analysis, there is a program section called '.text' which is 10 bytes long and contains the instructions "MOV, MOV, SYSCALL" at file offset 0x1000 - all as expected. However, there is also another program section, this one without name, defined between offsets 0x0000 and 0x00b0, which is exactly the space occupied by the ELF header + program header 0 + program header 1. I have tried a minimal C program, and gcc creates a similar program section there as well.

The question: Why? To what end? Why is it necessary to define this program section and what does this section have to do with the execution of this program?

Bonus question: Why is the actual machine code put at offset 0x1000; Wouldn't it have been more efficient to put it at 0x00b0, right after the other one?


I am using Ubuntu 20.04 with gcc 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04) on an Intel x86_64 cpu.

  • 2
    The code can't be at 0xb0 because protections are page granular. That said older versions of binutils did put them into the same page, which was a security issue and has been fixed. – Jester Jun 03 '21 at 19:02
  • 2
    Having a segment for the program headers themselves ensures they are loaded and made available to the interpreter (dynamic linker). The ELF specification does not seem to be clear about whether this is strictly required. Do you get this segment even for static binary? Of course the toolchain might not care and include it anyway. – Jester Jun 03 '21 at 19:12
  • @Jester now that you mention it, I indeed did not consider that the two sections were exactly a page away. I have tried using gcc -c -static, but the same section is present ... I indeed did not find anything in the specs, so I might try to write my own ELF without that program section and see if it runs. Thank you for the help! – MisterCavespider Jun 03 '21 at 20:09
  • @Jester: `ld basic.o` does make a static ELF executable. They're only using the `gcc` front-end to assemble, not link, so it's a misleading title. (`gcc -nostdlib -static foo.s` would make approximately the same executable all in one step, but with a `.note.gnu.build-id` section.) – Peter Cordes Jun 03 '21 at 21:36
  • @MisterCavespider: Note that 0x000...0b0 is the *size*, not the offset. Each row has offset:size stacked vertically. (With FileSiz and MemSiz specified separately). Also related: [Minimal executable size now 10x larger after linking than 2 years ago, for tiny programs?](https://stackoverflow.com/q/65037919) / [Why do my results different following along the tiny asm example?](https://stackoverflow.com/q/65461235) – Peter Cordes Jun 03 '21 at 21:42
  • Some other relevant answers: https://stackoverflow.com/a/44938843/50617 https://stackoverflow.com/a/43699979/50617 – Employed Russian Jun 04 '21 at 01:42
  • @EmployedRussian @PeterCordes thank you for the links, they sent me in the right direction! As for `-nostdlib`: I intentionally used a `.s` file as input, as to omit any overhead from the `exit()` functionality, but aparently `-static` was not enough; using both `-static -nostdlib` makes a minimal file! Also, I used the word 'offset' to name 'address 0xb0 in the file', as the ELF specs are pretty hard on the difference between 'offset' and 'address'. – MisterCavespider Jun 04 '21 at 18:56

0 Answers0