Running address of an application, followed by heap and stack expansions

Question

I have an m.c:

extern void a(char*);

int main(int ac, char **av){
    static char string [] = "Hello , world!\n";
    a(string);
}

and an a.c:

#include <unistd.h>
#include <string.h>

void a(char* s){
    write(1, s, strlen(s));
}

I compile and build these as:

g++ -c -g -std=c++14 -MMD -MP -MF "m.o.d" -o m.o m.c
g++ -c -g -std=c++14 -MMD -MP -MF "a.o.d" -o a.o a.c
g++ -o linux m.o a.o -lm -lpthread -ldl

Then, I examine the executable, linux thus:

objdump -drwxCS -Mintel linux

The output of this on my Ubuntu 16.04.6 starts off with:

start address 0x0000000000400540

then, later, is the init section:

00000000004004c8 <_init>:
  4004c8:   48 83 ec 08             sub    rsp,0x8

Finally, is the fini section:

0000000000400704 <_fini>:
  400704:   48 83 ec 08             sub    rsp,0x8
  400708:   48 83 c4 08             add    rsp,0x8
  40070c:   c3                      ret

The program references the string Hello , world!\n which is in .data section obtained by command:

objdump -sj .data linux

Contents of section .data:
 601030 00000000 00000000 00000000 00000000  ................
 601040 48656c6c 6f202c20 776f726c 64210a00  Hello , world!..

All of this tells me that the executable has been created so as to be loaded in actual memory address starting from around 0x0000000000400540 (address of .init) and the program accesses data in actual memory address extending until atleast 601040 (address of .data)

I base this on Chapter 7 of "Linkers & Loaders" by John R Levine, where he states:

A linker combines a set of input files into a single output file that is ready to be loaded at a specific address.

My question is about the next line.

If, when the program is loaded, storage at that address isn't available, the loader has to relocate the loaded program to reflect the actual load address.

(1) Suppose I have another executable that is currently running on my machine already using the memory space between 400540 and 601040, how is it decided where to start my new executable linux?

(2) Related to this, in Chapter 4, it is stated:

..ELF objects...are loaded in about the middle of the address space so the stack can grown down below the text segment and the heap can grow up from the end of the data, keeping the total address space in use relatively compact.

Suppose a previous running application started at, say, 200000 and now linux starts around 400540. There is no clash or overlap of memory address. But as the programs continue, suppose the heap of the previous application creeps up to 300000, while the stack of the newly launched linux has grown downward to 310000. Soon, there will be a clash/overlap of the memory addresses. What happens when the clash eventually occurs?

Martin Rosenau · Accepted Answer · 2020-08-07T06:27:05.327

If, when the program is loaded, storage at that address isn't available, the loader has to relocate the loaded program to reflect the actual load address.

Not all file formats support this:

GCC for 32-bit Windows will add the information required for the loader in the case of dynamic libraries (.dll). However, the information is not added to executable files (.exe), so such an executable file must be loaded to a fixed address.

Under Linux it is a bit more complicated; however, it is also not possible to load many (typically older 32-bit) executable files to different addresses while dynamic libraries (.so) can be loaded to different addresses.

Suppose I have another executable that is currently running on my machine already using the memory space between 400540 and 601040 ...

Modern computers (all x86 32-bit computers) have a paging MMU which is used by most modern operating systems. This is some circuit (typically in the CPU) which translates addresses seen by the software to addresses seen by the RAM. In your example, 400540 could be translated to 1234000, so accessing the address 400540 will actually access the address 1234000 in RAM.

The point is: Modern OSs use different MMU configurations for different tasks. So if you start your program again, a different MMU configuration is used that translates address 400540 seen by the software to address address 2345000 in RAM. Both programs using address 400540 can run at the same time because one program will actually access address 1234000 and the other one will access address 2345000 in RAM when the programs access the address 400540.

This means that some address (e.g. 400540) will never be "already in use" when the executable file is loaded.

The address may already be in use when a dynamic library (.so/.dll) is loaded because these libraries share the memory with the executable file.

... how is it decided where to start my new executable linux?

Under Linux the executable file will be loaded to the fixed address if it was linked in a way that it cannot be moved to another address. (As already said: This was typical for older 32-bit files.) In your example the "Hello world" string would be located at address 0x601040 if your compiler and linker created the executable that way.

However, most 64-bit executables can be loaded to a different address. Linux will load them to some random address because of security reasons making it more difficult for viruses or other malware to attack the program.

... so the stack can grown down below the text segment ...

I've never seen this memory layout in any operating system:

Both under Linux and under Solaris the stack was located at the end of the address space (somewhere around 0xBFFFFF00), while the text segment was loaded quite close to the start of the memory (maybe address 0x401000).

... and the heap can grow up from the end of the data, ...

suppose the heap of the previous application creeps up ...

Many implementations since the late 1990s do not use heap any more. Instead, they use mmap() to reserve new memory.

According to the manual page of brk(), the heap was declared as "legacy feature" in the year 2001, so it should not be used by new programs any longer.

(However, according to Peter Cordes malloc() still seems to use the heap in some cases.)

Unlike "simple" operating systems like MS-DOS, Linux does not allow you "simply" to use the heap, but you have to call the function brk() to tell Linux how much heap you want to use.

If a program uses heap and it uses more heap than available, the brk() function returns some error code and the malloc() function simply returns NULL.

However, this situation typically happens because no more RAM is available and not because the heap overlaps with some other memory area.

... while the stack of the newly launched linux has grown downward to ...

Soon, there will be a clash/overlap of the memory addresses. What happens when the clash eventually occurs?

Indeed, the size of the stack is limited.

If you use too much stack, you have a "stack overflow".

This program will intentionally use too much stack - just to see what happens:

.globl _start
_start:
    sub $0x100000, %rsp
    push %rax
    push %rax
    jmp _start

In the case of an operating system with an MMU (such as Linux), your program will crash with an error message:

~$ ./example_program
Segmentation fault (core dumped)
~$

EDIT/ADDENDUM

Is stack for all running programs located at the "end"?

In older Linux versions, the stack was located near (but not exactly at) the end of the virtual memory accessible by the program: Programs could access the address range from 0 to 0xBFFFFFFF in those Linux versions. The initial stack pointer was located around 0xBFFFFE00. (The command line arguments and environment variables came after the stack.)

And is this the end of actual physical memory? Will not the stack of different running programs then get mixed up? I was under the impression that all of the stack and memory of a program remains contiguous in actual physical memory, ...

On a computer using an MMU, the program never sees physical memory:

When the program is loaded, the OS will search some free area of RAM - maybe it finds some at the physical address 0xABC000. Then it configures the MMU in a way that the virtual addresses 0xBFFFF000-0xBFFFFFFF are translated to the physical addresses 0xABC000-0xABCFFF.

This means: Whenever the program accesses address 0xBFFFFE20 (for example using a push operation), the physical address 0xABCE20 in the RAM is actually accessed.

There is no possibility for a program at all to access a certain physical address.

If you have another program running, the MMU is configured in a way that the addresses 0xBFFFF000-0xBFFFFFFF are translated to the addresses 0x345000-0x345FFF when the other program is running.

So if one of the two programs will perform a push operation and the stack pointer is 0xBFFFFE20, the address 0xABCE20 in RAM will be accessed; if the other program performs a push operation (with the same stack pointer value), the address 0x345E20 will be accessed.

Therefore, the stacks will not mix up.

OSs not using an MMU but supporting multi-tasking (examples are the Amiga 500 or early Apple Macintoshes) will of course not work this way. Such OSs use special file formats (and not ELF) which are optimized for running multiple programs without MMU. Compiling programs for such OSs is much more complex than compiling programs for Linux or Windows. And there are even restrictions for the software developer (example: functions and arrays should not be too long).

Also, does each program have its own stack pointer, base pointer, registers, etc.? Or does the OS just have one set of these registers to be shared by all programs?

(Assuming a single-core CPU), the CPU has one set of registers; and only one program can run at the same time.

When you start multiple programs, the OS will switch between the programs. This means program A runs for (for example) 1/50 second, then program B runs for 1/50 second, then program A runs for 1/50 second and so on. It appears to you as if the programs run the same time.

When the OS switches from program A to program B, it must first save the values of the registers (of program A). Then it must change the MMU configuration. Finally it must restore program B's register values.

Modern Linux distros make 32-bit PIEs. You say that 64-bit Linux executable are usually relocatable, but most 32-bit executables aren't. Is that just considering the weight of history? x86-64 was already widespread for years before PIE executables started to become a thing; e.g. the OP's Ubuntu 16.04 is making non-PIE executable by default; those can't be ASLRed. GCC will have used instructions like `mov edi, offset .LC0` to put static addresses into registers, because the default non-PIE code model guarantees that static code/data is in the low 31 bits of address space. — Peter Cordes, Aug 06 '20 at 06:16
Also you're talking about the OP's program in the same paragraph as 32-bit. It's 64-bit, as we can tell for sure from the disassembly. Also, binutils `ld` has a different default base address for `.text` in 32-bit mode. The OP's program is a non-PIE (ELF type EXEC) x86-64 executable. Not relocatable: no relocation metadata to apply fixups to static data or code, and no requirement that either be position-independent. — Peter Cordes, Aug 06 '20 at 06:20
Current glibc `malloc` still uses `brk` for small allocations, `mmap` for large allocations (so it can definitely give the pages back to the OS, not getting stuck with it on the free list). There's a tuning heuristic, IIRC the cutoff is a couple pages or maybe even 64k. `strace ls` and see it use some `brk` syscalls. (The overall *point* of your answer is correct, of course; virtual memory makes it a non-problem. But unfortunately some of the specific details aren't right.) — Peter Cordes, Aug 06 '20 at 06:23
@PeterCordes I assumed that most modern x86 Linux distros are 64-bit, often not even supporting 32-bit programs without installing additional packages. So when writing about 32-bit programs, I meant old programs. Therefore, the word "typical" referred to the "average program" in the years 1995-2018, not to the "average program" in one of the few 32-bit distros that still exist. — Martin Rosenau, Aug 06 '20 at 06:32
@PeterCordes I updated the sentences about the 32-bit executable files in my answer. I also added an explaination about how heap is used in Linux and what happens if there is no more heap. — Martin Rosenau, Aug 06 '20 at 06:47
@MartinRosenau, is stack for all running programs located at the "end"? And is this the end of actual physical memory? Will not the stack of different running programs then get mixed up? I was under the impression that all of the stack and memory of a program remains contiguous in actual physical memory, growing and coming down as and when needed, but still staying contiguous. Also, does each program have its own stack pointer, base pointer, registers, etc.? Or does the OS just have one set of these registers to be shared by all programs? — Tryer, Aug 07 '20 at 00:59

score 2 · Answer 2 · answered Aug 06 '20 at 03:41

Yes, objdump on this executable shows addresses where its segments will be mapped. (Linking collects sections into segments: What's the difference of section and segment in ELF file format) .data and .text get linked into different sections with different permissions (read+write vs. read+exec).

If, when the program is loaded, storage at that address isn't available

That could only happen when loading a dynamic library, not the executable itself. Virtual memory means that each process has its own private virtual address space, even if they were started from the same executable. (This is also why ld can always pick the same default base address for the text and data segments, not trying to slot every executable and library on the system into a different spot in a single address space.)

An executable is the first thing that gets to lay claim to parts of that address space, when it's loaded/mapped by the OS's ELF program loader. That's why traditional (non-PIE) ELF executables can be non-relocatable, unlike ELF shared objects like /lib/libc.so.6

If you single-step a program with a debugger, or include a sleep, you'll have time to look at less /proc/<PID>/maps. Or cat /proc/self/maps to have cat show you its own map. (Also /proc/self/smaps for more details info on each mapping, like how much of it is dirty, using hugepages, etc.)

(Newer GNU/Linux distros configure GCC to make PIE executables by default: 32-bit absolute addresses no longer allowed in x86-64 Linux?. In that case objdump would only see addresses relative to a base of 0 or 1000 or something. And compiler-generated asm would have used PC-relative addressing, not absolute.)

Running address of an application, followed by heap and stack expansions

2 Answers2