return address indeed should belong in the .text
Not always, but most likely yes. Usually all the executable code of a program resides in the .text
section of the executable. As Peter Cordes notes in his comment below, there might be instances where a function defined in the .text
section of a program does not return in .text
. For example if the function is being used as callback for some library function (e.g. qsort
), or if it's a signal handler, an atexit()
handler, etc. These are however rare occurrences in plain C.
Why guess though? You can inspect the program under a debugger (e.g. GDB) and see this for yourself:
$ gdb ./program
(gdb) b some_function
(gdb) run
Beakpoint 1, 0xAAABBB in some_function from ./program
(gdb) backtrace
#0 0xAAABBB in some_function from ./program
#1 0xCCCDDD in some_other_function from ./program <== here's the return address
...
(gdb) info inferior
Num Description Executable
* 1 process 11107 ./program
(gdb) !cat /proc/11107/maps
555555554000-555555558000 r--p 00000000 103:05 1443328 ./program
555555558000-55555556b000 r-xp 00004000 103:05 1443328 ./program <=== .text
55555556b000-555555574000 r--p 00017000 103:05 1443328 ./program
555555575000-555555576000 r--p 00020000 103:05 1443328 ./program
555555576000-555555577000 rw-p 00021000 103:05 1443328 ./program
...
what are the approximate range of stack addresses, bss addresses and text addresses?
This is impossible to answer without inspecting the program and without knowing exactly what system you are running the program on and its configuration. If we are talking about an ELF running on Linux, which I'm going to assume here since it's the most common case, then you can try figuring it out like this:
Understand whether the ELF you are working with is a shared object (ET_DYN
) or a simple executable (ET_EXEC
). This can be done with readelf -h program
, which will show something like this:
Type: DYN (Shared object file)
If you are dealing with a shared object (DYN
), then you cannot know its base virtual address without running it, because that's going to be decided by the kernel. The kernel will calculate an appropriate address when loading, possibly randomized if ASLR is ON (by default it should be).
The approximate location of your ELF in virtual memory (and therefore of its .text
section) should be around 0x56555000
for Linux x86 32-bit, without taking ASLR into account. See also this other answer of mine on "Why does Linux favor 0x7f mappings?", which explains the logic behind this for x86-64 (and it similarly also applies to x86 32-bit).
ASLR makes things more complex as you might guess, because you will need to find a way to calculate the exact address you need at runtime. The simplest way in your scenario is probably to leak some return address that is already on the stack using your format string vulnerability, and then do some simple math (subtraction/addition) to figure out the address you want to return to.
If instead you are dealing with an executable (EXEC
), then the base virtual address will be fixed and known even before running the program. The output of readelf -WS program
will tell you the address of the .text
section, and if symbols are available, readelf -Ws program
will also tell you the virtual address of any function defined in the program.