In a C program, does return address of a function frame point to the .text section?

Question

I'm attempting to do a small ctf, and I'm trying to overflow the return address to inject shellcode. I expected the return address to be a very low one since it should point to the instruction (hence i assumed the .text segment).

Using format string vulnerability I explored the memory and found some value in range of 1f7ffd. I thought it is probably the return address, and from my knowledge the address just before it should be the value of ebp stored from the previous function frame. Hence I override the return address with the ebp + some offset. However it seems that overriding stored ebp causes immediate crash of the program, while overriding the return address seem to only crash the code after 1 more loop iteration.

This made me question if this is the right thing I'm overriding. So i thought i should ask if:

return address indeed should belong in the .text
what are the approximate range of stack addresses, bss addresses and text addresses? (I thought stack is around 7fffffff, bss is around 55555555, and everything significantly lower then this is text. (that is assuming that the system is some Linux flavor and 32 bit)

Return address points to the next instruction to execute after the return, which is of course code, therefore it goes in the text section which stores all the code. However, it can point to any executable page - the CPU doesn't know the difference. — user253751, Jul 15 '22 at 13:57
@user253751 yes, but the untampered one should be in .text right? — Nikolai Savulkin, Jul 15 '22 at 13:59
Yes, usually. Not 100% of the time. E.g. web browsers convert JavaScript to executable code on the heap. — user253751, Jul 15 '22 at 14:04

Marco Bonelli · Answer 1 · 2022-07-16T12:07:33.180

return address indeed should belong in the .text

Not always, but most likely yes. Usually all the executable code of a program resides in the .text section of the executable. As Peter Cordes notes in his comment below, there might be instances where a function defined in the .text section of a program does not return in .text. For example if the function is being used as callback for some library function (e.g. qsort), or if it's a signal handler, an atexit() handler, etc. These are however rare occurrences in plain C.

Why guess though? You can inspect the program under a debugger (e.g. GDB) and see this for yourself:

$ gdb ./program

(gdb) b some_function
(gdb) run

Beakpoint 1, 0xAAABBB in some_function from ./program
(gdb) backtrace
#0  0xAAABBB in some_function from ./program
#1  0xCCCDDD in some_other_function from ./program  <== here's the return address
...

(gdb) info inferior
  Num  Description       Executable
* 1    process 11107     ./program

(gdb) !cat /proc/11107/maps
555555554000-555555558000 r--p 00000000 103:05 1443328 ./program
555555558000-55555556b000 r-xp 00004000 103:05 1443328 ./program <=== .text
55555556b000-555555574000 r--p 00017000 103:05 1443328 ./program
555555575000-555555576000 r--p 00020000 103:05 1443328 ./program
555555576000-555555577000 rw-p 00021000 103:05 1443328 ./program
...

what are the approximate range of stack addresses, bss addresses and text addresses?

This is impossible to answer without inspecting the program and without knowing exactly what system you are running the program on and its configuration. If we are talking about an ELF running on Linux, which I'm going to assume here since it's the most common case, then you can try figuring it out like this:

Understand whether the ELF you are working with is a shared object (ET_DYN) or a simple executable (ET_EXEC). This can be done with readelf -h program, which will show something like this:
```
Type:      DYN (Shared object file)
```
If you are dealing with a shared object (DYN), then you cannot know its base virtual address without running it, because that's going to be decided by the kernel. The kernel will calculate an appropriate address when loading, possibly randomized if ASLR is ON (by default it should be).

The approximate location of your ELF in virtual memory (and therefore of its .text section) should be around 0x56555000 for Linux x86 32-bit, without taking ASLR into account. See also this other answer of mine on "Why does Linux favor 0x7f mappings?", which explains the logic behind this for x86-64 (and it similarly also applies to x86 32-bit).

ASLR makes things more complex as you might guess, because you will need to find a way to calculate the exact address you need at runtime. The simplest way in your scenario is probably to leak some return address that is already on the stack using your format string vulnerability, and then do some simple math (subtraction/addition) to figure out the address you want to return to.
If instead you are dealing with an executable (EXEC), then the base virtual address will be fixed and known even before running the program. The output of readelf -WS program will tell you the address of the .text section, and if symbols are available, readelf -Ws program will also tell you the virtual address of any function defined in the program.

If using a library with callback functions (such as libc.so's `qsort`), code in the executable can be called with a return address in a library's `.text`. Another rarer exception to the rule is if you're using GNU C nested functions, and take a pointer to a nested function. GCC makes asm that builds a trampoline *on the stack*, and enables `-z execstack` for the whole executable. Err, but that trampoline just jumps, I think, so you don't have a return address pointing to stack memory, nevermind that part. — Peter Cordes, Jul 15 '22 at 16:02

In a C program, does return address of a function frame point to the .text section?

1 Answers1