The layout of the virtual address space of a process is generally arranged by the linker and the loader. Generally, the memory of a process can be organized into types including:
- The “text” section, which contains the instructions that are executed.
- Read-only data, which contains values that are read but are not modified or executed.
- Initialized data, which is set to initial values at program start-up but may be modified during program execution.
- Uninitialized data, which the program needs for work space but does not need to be set to any particular values at program start-up.
There are various embellishments and refinements of the types of memory, but the above suffice for a general orientation.
Separating memory of different types is important for performance and for security, including:
- Because the text and read-only data sections are not modified during program execution, they can be shared if the same program is executed more than once simultaneously. Each time the program is run, the operating system can map these portions of the virtual address spaces of different processes to the same physical memory. Every process will have the same data in these sections.
- Because the non-text sections should never be executed as instructions, they can be marked as non-executable. This means the hardware will cause an exception if an attempt is made to execute them, which should only occur if there is a bug or an attacker causes the program to execute things it should not.
The operating system does not keep track of every individual byte in the address space of a process. Most hardware does not support it, and it would require too much data. Instead, there is a minimum amount of memory, called a page, that is used. Whenever memory is marked executable or not-executable, writable or not writable, it must be done in units of whole pages. 4096 bytes is a typical page size, but it varies from system to system.
What you are seeing in the difference addresses of a char string[] = "Hello";
and the char *string2 = "Hello";
is that the read-only data of the string literal "Hello"
is being put into a different page than the modifiable array initialized with "Hello"
.
There is nothing magic about the address 0x400000, except that it was chosen as the place for read-only data in the system you are using. It could be elsewhere. There may even be linker options to move it to an address of your choosing. What is important is merely that it is separate from the modifiable data.
While the linker is reading object modules and organizing them to form one executable, it concatenates the same types of segments from different object modules. That is, it takes the text segments from each module and puts them together into one large text segment. It takes the read-only data segments from each module and puts them together into one large data segment. And so on. This is simply more efficient than leaving each object module’s segments as separate pieces—if one object module used 2.5 pages for read-only data and another used 1.5 pages, then putting them together uses just 4 pages, whereas leaving them separate with fragments of pages unused would use 5 pages. (Some special segments might be processed in different ways than concatenation.)
Additionally, although memory can be managed in units of pages, there may be benefits to grouping larger amounts of memory of the same type together. If you mark an entire megabyte to have the same attributes, the operating system might use less data to keep track of it than if you marked pages individually. This could be part of the reason that the linker sets aside a large amount of program address space for a certain type of memory. (I am speculating; I am not familiar with the current motivations and design for the specific operating system you are using.)