What is in the PTE address field for an anonymously zero-fill-on-demand mapped page?

Question

When a program calls mmap to allocate an anonymous page, also known as a demand-zero page, what appears in the address field of the corresponding page table entry (PTE)? I am assuming that the kernel does not create a zero-initialized page in physical memory (and enter that physical page's page number into the PTE) until the requesting process actually touches the page — hence the term demand-zero. Since it would not be a disk address, and would not be 0 (which is for unallocated pages), what value would appear there? As a different but related question, how does the kernel "know" that this page is to be handled as a demand-zero page, i.e., that the fault handler should find a physical page and initialize it with 0 rather than copy a page from disk?

I don't know the answer, but it's entirely up to software: once the hardware sees that the page has the present bit cleared, it ignores everything else, so the address field can contain whatever the kernel pleases - it needn't necessarily represent an address. Especially on a 64-bit system, there are plenty of available bits, so one could be used to signal "demand-zero page". — Nate Eldredge, Nov 24 '21 at 03:52
Or for that matter, the kernel could ignore the PTE contents completely, and maintain some other data structure where it can look up a virtual address and see how it's supposed to be handled. I vaguely recall that Linux does some combination of these: some of the relevant data is in the PTE, some is in other vm structs. — Nate Eldredge, Nov 24 '21 at 03:53
And I think your second question is answered by the first: the fault handler looks at the PTE and the vm data structures and sees, based on that information, that this page is meant to be demand-zero - at which point it allocates a physical page, updates the PTE to point to this page, flushes the TLBs, and returns to user space to resume the faulting instruction. — Nate Eldredge, Nov 24 '21 at 03:56
@Nate Eldredge—Thanks. Yes, of course it is up to the kernel (but thanks for documenting that for the general reader)—but my question took off from Bryant and O'Hallaron's description of the Linux page table, where the address field is either the physical page number (if the page is present in memory), the disk address (if it is not but the page is allocated), or 0/NULL (if the page is unallocated)—so they do not describe what it has if the page is allocated but not present and also has no disk address. — Amittai Aviram, Nov 25 '21 at 04:05

Marco Bonelli · Answer 1 · 2021-12-02T00:13:15.753

I am assuming that the kernel does not create a zero-initialized page in physical memory

Indeed, this is usually the case. Unless special cases, like for example if MAP_POPULATE is specified to explicitly request the page to be initialized (also called "pre-fauting").

what appears in the address field of the corresponding page table entry (PTE)?

Right after mmap you don't even have a PTE allocated for the page (or in general, you don't have any entry at any page table level). For what the CPU is concerned, the page doesn't even exist. If you were to walk the page table you would just get to a point (at an arbitrary level) where the corresponding entry is marked as "not present".

Since it would not be a disk address, and would not be 0 (which is for unallocated pages), what value would appear there?

For what the CPU is concerned, the page is unallocated. At the first page fault, two things can happen:

For a read page fault, the PTE is updated to point to the zero page: this is a special page that is always entirely zeroed-out and is pointed to by the PTEs of any anonymous (demand-zero) page in the system that has not been modified yet.
For a write page fault, an actual physical page will be allocated and the corresponding PTE updated to point to its physical address.

Quoting directly from the documentation:

The anonymous memory or anonymous mappings represent memory that is not backed by a filesystem. Such mappings are implicitly created for program’s stack and heap or by explicit calls to mmap(2) system call. Usually, the anonymous mappings only define virtual memory areas that the program is allowed to access. The read accesses will result in creation of a page table entry that references a special physical page filled with zeroes. When the program performs a write, a regular physical page will be allocated to hold the written data. The page will be marked dirty and if the kernel decides to repurpose it, the dirty page will be swapped out.

how does the kernel "know" that this page is to be handled as a demand-zero page, i.e., that the fault handler should find a physical page and initialize it with 0 rather than copy a page from disk?

When a page fault occurs, the kernel page fault handler (architecture-dependent) determines to which VMA the page belongs to, and retrieves the corresponding struct vm_area_struct (which was created earlier either by the kernel itself or by a mmap syscall). This structure is then passed on to architecture-independent code (do_fault()) along with the needed fault information (struct vm_fault).

The vm_area_struct then contains all the remaining necessary information to handle the fault (for example the ->vm_file field which is != NULL in case of a file-backed mapping). The field ->vm_ops points to a struct vm_operations_struct which defines a set of function pointers to call in different occasions. In particular anonymous VMAs have ->vm_ops == NULL.

For other kind of pages, ->fault() is the function used when handling a page fault. This function knows what to check and how to actually handle the fault.

B & O also describe the VMA, but do not explain how the kernel could use the VMA to distinguish between, say, an unallocated page and an allocated page to be created and zero-initialized.

Simple, just check vma->vm_ops == NULL and in such case you know that the page is a demand-zero anon page. Then on a page fault act as needed (read fault -> update PTE to point to global zero page, write fault -> allocate a page and update PTE).

OK, thank you. For background on my question, please see my reply to @Nate Eldredge. Bryant and O'Hallaron's account sounds as if there are only three possibilities, all determined by the PTE: (1) the page is present, so the address is the page number in memory; (2) the page is absent and the PTE has an address, so that must be the disk address, and the page fault handler should find a place in memory to which to copy it, copy it, and enter the new address into the PTE; or (3) the page is absent, the address is NULL, so the page must be unallocated. — Amittai Aviram, Nov 25 '21 at 04:14
B & O also describe the VMA, but do not explain how the kernel could use the VMA to distinguish between, say, an unallocated page and an allocated page to be created and zero-initialized. — Amittai Aviram, Nov 25 '21 at 04:15

What is in the PTE address field for an anonymously zero-fill-on-demand mapped page?

1 Answers1