I have some questions about the memory layout of a program in Linux. I know from various sources (I'm reading "Programming from the Ground Up") that each section is loaded into it's own region of memory. The text section loads first at virtual address 0x8048000, the data section is loaded immediately after that, next is the bss section, followed by the heap and the stack.
To experiment with the layout I made this program in assembly. First it prints the addresses of some labels and calculates the system break point. Then it enters into an infinite loop. The loop increments a pointer and then it tries to access the memory at that address, at some point a segmentation fault will exit the program (I did this intentionally).
This is the program:
.section .data
start_data:
str_mem_access:
.ascii "Accessing address: 0x%x\n\0"
str_data_start:
.ascii "Data section start at: 0x%x\n\0"
str_data_end:
.ascii "Data section ends at: 0x%x\n\0"
str_bss_start:
.ascii "bss section starts at: 0x%x\n\0"
str_bss_end:
.ascii "bss section ends at: 0x%x\n\0"
str_text_start:
.ascii "text section starts at: 0x%x\n\0"
str_text_end:
.ascii "text section ends at: 0x%x\n\0"
str_break:
.ascii "break at: 0x%x\n\0"
end_data:
.section .bss
start_bss:
.lcomm buffer, 500
.lcomm buffer2, 250
end_bss:
.section .text
start_text:
.globl _start
_start:
# print address of start_text label
pushl $start_text
pushl $str_text_start
call printf
addl $8, %esp
# print address of end_text label
pushl $end_text
pushl $str_text_end
call printf
addl $8, %esp
# print address of start_data label
pushl $start_data
pushl $str_data_start
call printf
addl $8, %esp
# print address of end_data label
pushl $end_data
pushl $str_data_end
call printf
addl $8, %esp
# print address of start_bss label
pushl $start_bss
pushl $str_bss_start
call printf
addl $8, %esp
# print address of end_bss label
pushl $end_bss
pushl $str_bss_end
call printf
addl $8, %esp
# get last usable virtual memory address
movl $45, %eax
movl $0, %ebx
int $0x80
incl %eax # system break address
# print system break
pushl %eax
pushl $str_break
call printf
addl $4, %esp
movl $start_text, %ebx
loop:
# print address
pushl %ebx
pushl $str_mem_access
call printf
addl $8, %esp
# access address
# segmentation fault here
movb (%ebx), %dl
incl %ebx
jmp loop
end_loop:
movl $1, %eax
movl $0, %ebx
int $0x80
end_text:
And this the relevant parts of the output (this is Debian 32bit):
text section starts at: 0x8048190
text section ends at: 0x804823b
Data section start at: 0x80492ec
Data section ends at: 0x80493c0
bss section starts at: 0x80493c0
bss section ends at: 0x80493c0
break at: 0x83b4001
Accessing address: 0x8048190
Accessing address: 0x8048191
Accessing address: 0x8048192
[...]
Accessing address: 0x8049fff
Accessing address: 0x804a000
Violación de segmento
My questions are:
1) Why is my program starting at address 0x8048190 instead of 0x8048000? With this I guess that the instruction at the "_start" label is not the first thing to load, so what's between the addresses 0x8048000 and 0x8048190?
2) Why is there a gap between the end of the text section and the start of the data section?
3) The bss start and end addresses are the same. I assume that the two buffers are stored somewhere else, is this correct?
4) If the system break point is at 0x83b4001, why I get the segmentation fault earlier at 0x804a000?