Challenges about the amount and characteristics of code within an executable file loaded into memory per each process

Question

When an OS such as Windows wants to run an executable file, first it should load it into RAM. Due to prevent wasting the memory, loading it partly into the memory seems more intellectual than to load it entirely.

So under the condition like that, my questions are:

What happens exactly when the controls arrives to an instruction like JMP containing an address out of range the loaded code? In other words how does the OS recognize that it must stop executing the instruction to avoid jumping to a irrelevant address and how does it calculate which page the related address situated in?
How many pages of code does the OS copy into RAM before jumping to the entry point of a program? I mean whether the OS always copies the fixed amount of code or fixed number of pages into RAM necessarily or it could be uncertain?
If the OS makes a decision that how much code or how many pages should be loaded into memory so what conditions are considered till the decision like that is made?

Thanks to all.

The `assembly` tag is not suitable for this question. You can use the `paging` tag. — Hadi Brais, May 02 '18 at 18:06
I don't think so. for being skillful enough at X86 assembly programming, It's essential to have a good insight on operating systems structure and the system architecture and its details and vice versa. — Farshid, May 03 '18 at 07:02
Yeah but the question is not about assembly programming at all. — Hadi Brais, May 03 '18 at 17:01

score 3 · Answer 1 · edited May 03 '18 at 01:08

A modern OS's program loader basically uses mmap, not read. https://en.wikipedia.org/wiki/Memory-mapped_file#Common_uses says:

Perhaps the most common use for a memory-mapped file is the process loader in most modern operating systems (including Microsoft Windows and Unix-like systems.)

This creates a file-backed private mapping. (https://en.wikipedia.org/wiki/Virtual_memory).

... In other words how does the OS recognize that it must stop executing the instruction to avoid jumping to a irrelevant address and how does it calculate which page the related address situated in?

In that case code-fetch causes a page fault, just like if your code loaded from part of a big static array that wasn't loaded from disk yet. After possible loading the page from disk (if it wasn't already present in the page cache) and updating the page tables, execution resumes at the address that faulted, to retry the instruction.

The CPUs virtual memory hardware ("MMU", although that's not actually a separate thing in a modern CPU) handles detection of loads/stores/code-fetch from unmapped addresses. (Unmapped according to the actual page tables the HW can see. When a process "logically" has some memory mapped, but the OS is being lazy about it, we say the memory isn't "wired" into the page tables, so a page fault will bring it into memory if it's not already, and will wire it up in the page tables so the HW can access it (after a TLB miss to trigger a hardware page-walk.)

If there are any runtime symbol relocations, aka fixups, to account for the program being loaded at a base address other than the one it was linked for if it needs any absolute addresses in memory, they may require writing pages of code or otherwise-read-only data, dirtying the virtual memory page so it's backed by the pagefile instead of the executable on disk. e.g. if your C source includes int *foo = &bar; at global scope, or int &foo = bar;

How many pages of code does the OS copy into RAM before jumping to the entry point of the program?

The program loader probably has some heuristics to make sure the entry point and maybe some other pages are mapped before trying the first time. Other than that IDK if there are any special heuristics in the virtual-memory code for executables / libraries vs. non-executable mappings.

Oh, I didn't see your answer when posting mine. I hope this doesn't make mine a duplicate :) — Margaret Bloom, May 02 '18 at 15:52
@MargaretBloom: looks like we took different approaches. I didn't search for possible duplicate questions; did you? (With our answers here, maybe some other questions should be closed as dupes of this :P) — Peter Cordes, May 02 '18 at 16:12
re: loader heuristics: Yes both Linux and Windows employ prefetching mechanisms to reduce the impact of hard page faults on perf. — Hadi Brais, May 02 '18 at 16:39
Yes, there are readahead mechanisms for hard faults (to avoid lots of small disk IO), and there are is also a distinct "faultaround" optimization for soft page faults. The latter helps in the common case that the pages exist in the page cache, but aren't mapped into the current process: when a page fault is taken accessing a given page, Linux will check if "nearby" pages are already present in RAM, and if so will bring in (add to the page tables) several of those pages in the same fault. The default faultaround number is 16 and you see this in a straightforward way in ... — BeeOnRope, May 03 '18 at 01:12
... any process that accesses a large filed mapped area that is already cached: the number of soft faults will 1/16th of the number of pages. This optimization makes tricks like `MAP_POPULATE` less necessary. — BeeOnRope, May 03 '18 at 01:12

Margaret Bloom · Answer 2 · 2018-05-04T19:24:06.473

The processor divides the address space into sets of addresses called pages.
On x86 a typical page is of size 4KiB but other sizes are possible (e.g. 1GiB, 2 MiB).
Pages are continuous, so the first page is from address 0x00000000 to address 0x00000fff, for each address there is a unique page associated with it.

A page has a set of attributes, the whole point of paging is to associate a set of attributes to each address.
Since doing it for every single address would be too prohibitive, pages are used instead.
All the addresses in a page share the same attribute.

I somewhat simplified the story by not differentiating between virtual addresses (the ones that actually are paginated, i.e. they can have attributes) and physical addresses (the real addresses to use, a virtual address can be mapped to a different physical address).

Among the various attributes there are:

One that tells the CPU if the page is to be considered not loaded.
Basically, this makes the CPU generate an exception when an instruction tries to access the page (e.g. read from it, including execution, or writing to it).
Permissions
Like read-only, non-executable, supervisor, etc.
The physical address
The main use of paging is isolation, it can be accomplished by letting the same virtual address X be mapped into different physical addresses Y1 and Y1 for the process P1 and P2 respectively.

Remember that these attributes are per-page, they apply to the whole range of addresses in a page (e.g. they affects 4 KiB addresses for a 4 KiB page).

With this in mind:

When a process is created all its pages are marked as non-present. Accessing them would make the CPU fault.
When the OS loads the program, a minimal set of pages are loaded (e.g. the kernel, part of it, the common libs, part of the program code and data) and marked present.
When the program accesses a page not loaded the OS checks if the address was allocated by the program, if so (this is a valid page fault) it loads the page and resumes execution.
If the address was not allocated, an invalid page fault occurs and the exception reported to the program itself.
I don't know the exact number of pages loaded, one could verify it in different ways, including taking a look at the Linux kernel (for the Linux case).
I'm not doing it because the actual strategy used may be complex and I don't find it particularly relevant: the OS could load the whole program if it is small enough and the stress on the memory is low.
There could be settings to tweak to chose one or another strategy.
In general, it is reasonable to assume that only a fixed number of pages is loaded optimistically.
Factors that influence the decision could be: the amount of memory available, the priority of the process loaded, policy on the system made by the sysadmin (to prevent bloating it), type of the process (a service like a DBMS could be marked as memory intensive), restriction of the program (e.g. in a NUMA machine a process may be marked to use, prevalently, local memory, thereby having access to less memory than the total available), euristics implemented by the OS (e.g. it know that the last execution required K pages of code/data within M milliseconds from the start).
To put it simply, and short, the algorithm used to load the optimal number of pages has to predict the future a bit, so the usual considerations of the case are made (i.e. assumptions, simplifications, data collection and similar).

Soft page faults include the case where code or data was hot in the pagecache for file-backed mappings. It's not limited to anonymous mappings for stuff like malloc. — Peter Cordes, May 02 '18 at 16:15
@BeeOnRope You are right, I was under the impression that a soft page fault occurred when the OS needed to load an allocated but not present page. Thank you :) — Margaret Bloom, May 03 '18 at 09:20

Challenges about the amount and characteristics of code within an executable file loaded into memory per each process

2 Answers2