I'm trying to understand where exactly does the executable assembly of a program end up, when a program is loaded/running. I found two resources talking about this, but they are somewhat difficult to read:
- Understanding ELF using readelf and objdump Linux article (code formatting is messed up)
- Michael Guyver, Some Assembly Required*: Relocations, Relocations (lots of assembly which I'm not exactly proficient in)
So, here's a brief example; I'm interested where does the executable section of the tail
program end up. Basically, objdump tells me this:
$ objdump -dj .text /usr/bin/tail | head -10
/usr/bin/tail: file format elf32-i386
Disassembly of section .text:
08049100 <.text>:
8049100: 31 ed xor %ebp,%ebp
8049102: 5e pop %esi
8049103: 89 e1 mov %esp,%ecx
...
I'm assuming I'd see calls to tail
's 'main()
' be made here, had symbols not been stripped. Anyways, the start of the executable section is, according to this, 0x08049100
; I'm interested in where it ends up eventually.
Then, I run tail
in the background, getting its pid:
$ /usr/bin/tail -f & echo $!
28803
... and I inspect its /proc/pid/maps
:
$ cat /proc/28803/maps
00547000-006a8000 r-xp 00000000 08:05 3506 /lib/i386-linux-gnu/libc-2.13.so
...
008c6000-008c7000 r-xp 00000000 00:00 0 [vdso]
08048000-08054000 r-xp 00000000 08:05 131469 /usr/bin/tail
08054000-08055000 r--p 0000b000 08:05 131469 /usr/bin/tail
08055000-08056000 rw-p 0000c000 08:05 131469 /usr/bin/tail
08af1000-08b12000 rw-p 00000000 00:00 0 [heap]
b76de000-b78de000 r--p 00000000 08:05 139793 /usr/lib/locale/locale-archive
...
bf845000-bf866000 rw-p 00000000 00:00 0 [stack]
Now I have tail
three times - but the executable segment r-xp
(which is the .text
?) is apparently at 0x08048000
(an address that apparently was standardized back with SYSV for x86; also see Anatomy of a Program in Memory : Gustavo Duarte for an image)
Using the gnuplot
script below, I arrived at this image:
First (topmost) plot shows "File offset" of sections from objdump
(starts from 0x0
); middle plot shows "VMA" (virtual memory address) of sections from objdump
and bottom plot shows layout from /proc/pid/maps
- both of these starting from 0x08048000
; all three plots show the same range.
Comparing topmost and middle plot, it seems that the sections are more-less translated "as is" from the executable file to the VMA addresses (apart from the end); such that the whole executable file (not just .text section) starts from 0x08048000
.
But comparing middle and bottom plot, it seems that when a program is running in memory, then only .text is "pushed back" to 0x08048000
- and not only that, it now appears larger!
The only explanation I have so far, is what I read somewhere (but lost the link): that an image in memory would have to have allocated a whole number of pages (of size e.g. 4096 bytes), and start from a page boundary. The whole number of pages explains the larger size - but, given that all these are virtual addresses, why the need to "snap" them to a page boundary (could one not, just as well, map the virtual address as is to a physical page boundary?)
So - could someone provide an explanation so as to why /proc/pid/maps
sees the .text section in a different virtual address region from objdump
?
mem.gp
gnuplot script:
#!/usr/bin/env gnuplot
set term wxt size 800,500
exec = "/usr/bin/tail" ;
# cannot do - apparently gnuplot waits for children to exit, so locks here:
#runcmd = "bash -c '" . exec . " -f & echo $!'"
#print runcmd
#pid = system(runcmd) ;
#print runcmd, "pid", pid
# run tail -f & echo $! in another shell; then enter pid here:
pid = 28803
# $1 Idx $2 Name $3 Size $4 VMA $5 LMA $6 File off
cmdvma = "<objdump -h ".exec." | awk '$1 ~ \"^[0-9]+$\" && $2 !~ \".gnu_debuglink\" {print $1, $2, \"0X\"$3, \"0X\"$4;}'" ;
cmdfo = "<objdump -h ".exec." | awk '$1 ~ \"^[0-9]+$\" && $2 !~ \".gnu_debuglink\" {print $1, $2, \"0X\"$3, \"0X\"$6;}'" ;
cmdmaps = "<cat /proc/".pid."/maps | awk '{split($1,a,\"-\");b1=strtonum(\"0x\"a[1]);b2=strtonum(\"0x\"a[2]);printf(\"%d \\\"%s\\\" 0x%08X 0x%08X\\n\", NR,$6,b2-b1,b1);}'"
print cmdvma
print cmdfo
print cmdmaps
set format x "0x%08X" # "%016X";
set xtics rotate by -45 font ",7";
unset ytics
unset colorbox
set cbrange [0:25]
set yrange [0.5:1.5]
set macros
set multiplot layout 3,1 columnsfirst
# 0x08056000-0x08048000 = 0xe000
set xrange [0:0xe000]
set tmargin at screen 1
set bmargin at screen 0.667+0.1
plot \
cmdfo using 4:(1+$0*0.01):4:($4+$3):0 with xerrorbars lc palette t "File off", \
cmdfo using 4:(1):2 with labels font ",6" left rotate by -45 t ""
set xrange [0x08048000:0x08056000]
set tmargin at screen 0.667
set bmargin at screen 0.333+0.1
plot \
cmdvma using 4:(1+$0*0.01):4:($4+$3):0 with xerrorbars lc palette t "VMA", \
cmdvma using 4:(1):2 with labels font ",6" left rotate by -45 t ""
set tmargin at screen 0.333
set bmargin at screen 0+0.1
plot \
cmdmaps using 4:(1+$0*0.01):4:($4+$3):0 with xerrorbars lc palette t "/proc/pid/maps" , \
cmdmaps using 4:(1):2 with labels font ",6" left rotate by -45 t ""
unset multiplot
#system("killall -9 " . pid) ;