I am trying to wrap my head around the workings of the ELF file format and the way objects get linked and executed.
In order to learn more I tried to analyze the output of the following assembler code:
#----------------------------------------------------------------------------------------
# Exits immediately using code 3. Runs on 64-bit Linux
# gcc -c basic.s && ld basic.o && ./a.out
#----------------------------------------------------------------------------------------
.global _start
.text
_start:
# _exit(3)
mov $60, %rax # system call 60 is exit
mov $3, %rdi # we want return code 3
syscall # invoke operating system to exit
I am using readelf
and hd
to analyze the output a.out
.
Here are the relevant outputs:
readelf -l a.out
:
Elf file type is EXEC (Executable file)
Entry point 0x401000
There are 2 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000000b0 0x00000000000000b0 R 0x1000
LOAD 0x0000000000001000 0x0000000000401000 0x0000000000401000
0x0000000000000010 0x0000000000000010 R E 0x1000
Section to Segment mapping:
Segment Sections...
00
01 .text
Excerpts of hd a.out
:
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 02 00 3e 00 01 00 00 00 00 10 40 00 00 00 00 00 |..>.......@.....|
00000020 40 00 00 00 00 00 00 00 e0 10 00 00 00 00 00 00 |@...............|
00000030 00 00 00 00 40 00 38 00 02 00 40 00 05 00 04 00 |....@.8...@.....|
00000040 01 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 00 00 40 00 00 00 00 00 00 00 40 00 00 00 00 00 |..@.......@.....|
00000060 b0 00 00 00 00 00 00 00 b0 00 00 00 00 00 00 00 |................|
00000070 00 10 00 00 00 00 00 00 01 00 00 00 05 00 00 00 |................|
00000080 00 10 00 00 00 00 00 00 00 10 40 00 00 00 00 00 |..........@.....|
00000090 00 10 40 00 00 00 00 00 10 00 00 00 00 00 00 00 |..@.............|
000000a0 10 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 |................|
000000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
[...]
and
[...]
00001000 48 c7 c0 3c 00 00 00 48 c7 c7 03 00 00 00 0f 05 |H..<...H........|
[...]
As you can see from this analysis, there is a program section called '.text' which is 10 bytes long and contains the instructions "MOV, MOV, SYSCALL" at file offset 0x1000 - all as expected. However, there is also another program section, this one without name, defined between offsets 0x0000 and 0x00b0, which is exactly the space occupied by the ELF header + program header 0 + program header 1. I have tried a minimal C program, and gcc creates a similar program section there as well.
The question: Why? To what end? Why is it necessary to define this program section and what does this section have to do with the execution of this program?
Bonus question: Why is the actual machine code put at offset 0x1000; Wouldn't it have been more efficient to put it at 0x00b0, right after the other one?
I am using Ubuntu 20.04 with gcc 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04) on an Intel x86_64 cpu.