Static code/data at low addresses, stack at high addresses, is the traditional model. x86-64 follows that; i386 was the unusual one. (With "the heap" in the middle, even though that's not a real thing in asm; there's .data/.bss above .text, brk
adding more space just past .bss, and mmap picking random addresses in between.)
The i386 layout left room to put the stack below code, but modern Linux didn't do that anyway. You still get stack addresses like 0xffffe000
in 32-bit code (e.g. under a 64-bit kernel). I'm not sure where a modern build of a 32-bit kernel would put user-space stacks. Of course that's just for the main thread's stack; stacks for new threads have to be allocated manually, usually with mmap.
Why 0x400000 (4 MiB) specifically for the ld
default base address?
High enough to avoid mmap_min_addr
(default 64k) and leave a gap so NULL deref is still likely to fault noisily, instead of silently read code. Even if it's like ptr[i]
with some large i
. But otherwise near the bottom of virtual address space is a good place,
Also to optimize the page tables: they're a sparse radix tree (diagram in this answer). Ideally the pages in use share as many higher levels of the tree as possible, so higher levels of the tree have mostly "not present" entries. Less for the kernel to allocate & manage, and the HW page-table walker can internally cache higher level entries (PDE cache) to speed up TLB misses in 4k pages when they're in the same 2M, 1G, or 512G region. And the page-walker(s) accesses memory through cache, so smaller page tables also mean less cache footprint from those accesses.
0x400000 = 4MiB. It's the start of a 2MiB group of pages near the start of the low 1GiB of virtual address space. So an executable with larger code and/or static data that needs multiple pages will have them all in the same subtree of the page tables, touching as few as possible different 1G and 2M regions.
Well, almost as few 1G regions as possible: starting at 0x40000000
(1 GiB) would have put it at the very start of a 1GiB region, not skipping the first two 2MiB largepages of it. But that only matters if your static data size was just below 1GiB, otherwise you still fit in the first 1GiB hugepage region, or extended into the 2nd one anyway.
Basically a duplicate of Why Linux/gnu linker chose address 0x400000? - when I answered that, I forgot I'd already answered this.