How virtualized page table works

Question

Reading about the virtualized page table concept, where part of the page table is put in virtual memory. Wikipedia as well as Patterson & Hennessy (5.7 elaboration in the Page Faults section) say that the reason you don't put the entire page table into virtual memory is that it can lead to circular page faults. But it seems to me like there's an even more basic issue - how you would find the page table in the first place? It seems like you necessarily need some record of where in physical memory to begin the translation process by locating the page table.

To clarify my question a bit more, my confusion is not "why can't all the page tables be in memory". Rather, it's that the rationale given is to "prevent against circular page faults". It seems like the rationale should be more fundamentally that "the processor must have some physical address as a starting point", and whatever you call that, it's a physical page table. What does this have to do with "circular page faults"? Seems much more fundamental.

Could you include quotes from Wikipedia and mention which page from Patterson & Hennessy you are referring to so we can better understand the question? — Hadi Brais, May 14 '18 at 01:14
I've only heard about virtualized page tables in the context of a VM guest / host, where there are two separate translation steps (from guest virtual to guest physical, and host virtual to host physical). — Peter Cordes, May 14 '18 at 01:38
Like Wikipedia says, if you wanted to page out parts of your page table, you'd need at least the top level page directory to stay resident at a known physical address. But its entries could either be physical addresses (when the next level directory is present) or some kind of virtual or disk references for next-level tables that aren't present. — Peter Cordes, May 14 '18 at 01:39

Hadi Brais · Answer 1 · 2018-08-20T06:57:23.783

The VAX architecture from the 70s used the virtualized linear page table approach to implement paging. VAX partitions the 32-bit virtual address space into four ranges:

P0: 0x00000000 - 0x3fffffff.
P1: 0x40000000 - 0x7fffffff.
S0: 0x80000000 - 0xbfffffff.
S1: 0xc0000000 - 0xffffffff.

P0 (called the program region) and P1 (called the control region) are user-specific partitions, S0 is the system (kernel) partition, and S1 is reserved. So each process has its own set of mappings for P0 and P1, but all processes and the kernel share the same mappings for S0.

Note that the most two significant bits of a virtual address are used to determine which section of the virtual memory to access. Each section (except for S1, which is not usable) is defined by a page table. In particular, P0 and P1 are defined by a virtualized page table (the page tables are mapped to virtual memory), but S0's page table is not virtualized. Each page table is a contiguous array of 4-byte page table entries. Each page table entry is either invalid or valid (which means it contains the physical address of a 512-byte page).

VAX provides 6 registers to define the page tables: a page table base address register and a page table length register for each of the three sections of the virtual address space P0, P1, and S0. The base address registers of P0 and P1 contain virtual addresses whose two most significant bits are 10. That is, S0's page table contains the page table entries that contain the physical addresses of P0 and P1. This allows the page tables of P0 and P1 of any process to be resident in main memory or in secondary storage. On the other hand, the base address register of S0 contains the physical address of S0's page table.

So essentially, the page table of a process is divided into three contiguous page tables, two of which are virtualized and one is always resident in memory. From Wikipedia:

It was mentioned that creating a page table structure that contained mappings for every virtual page in the virtual address space could end up being wasteful. But, we can get around the excessive space concerns by putting the page table in virtual memory, and letting the virtual memory system manage the memory for the page table.

However, part of this linear page table structure must always stay resident in physical memory, in order to prevent against circular page faults, that look for a key part of the page table that is not present in the page table, which is not present in the page table, etc.

S0's page table is the part of the linear page table that must always reside in memory (i.e., not virtualized). But why does it have to be like that? What happens if S0's base address register contained a virtual address rather than a physical address of the page table? But in that case, how can the processor figure out the physical base address of the page table? We would need some additional data structure with a known physical address that enables us to figure out the physical address of the page table. Let's for the sake of argument assume that we have such a data structure that is stored somewhere. Is it possible for the page table to be fully swapped out to secondary storage? Yea we can do that if we have something like a "present bit" or "valid bit" in that data structure. However, the present bit was set to false, a page fault occurs when accessing memory at any virtual address. The OS now needs to handle the page fault an if it requires to access any virtual address, it will page fault again, and so on.

Otherwise, in general, if the page fault handler is designed to use only physical addresses (by turning off paging) that point to data and code that are always present, then effectively you can get around virtualizing the whole page table. But this would complicate the design of the handler considerably.

Partitioning the page table into more than one contiguous array like how it's done in VAX means that some part of the page table (S0's) must be present at all times.

But if S0's page table contains entries to find P0's and P1's page tables, then isn't that also a multi-level page table effectively? To answer this question, let's compare how address translation is done in VAX and 32-bit x86.

In VAX translation, the virtual page number is the same as the page table index.

|31|29                  9|8       0|
------------------------------------
|  | virtual page number | offset  |
------------------------------------
|  |  page table index   | offset  |
------------------------------------

In 32-bit x86 translation (with PAE and PSE disabled), the virtual page number is partitioned into two indices for the two-level page table.

|31                  12|11        0|
------------------------------------
|  virtual page number |  offset   |
------------------------------------
| PT 1 index|PT 2 index|  offset   |
------------------------------------

In VAX, only accesses to the user page tables require two-level lookups. More importantly, the two lookups are performed using two different virtual addresses. On the other hand, accesses to the system page table require only a single lookup using a single virtual address. In contrast, in x86, all accesses require two-level lookups using the same virtual address.

The x86 architecture supports virtualized multi-level page tables.

We can design a hybrid page table that is potentially more powerful than both. If we use the S1 partition as a third user partition. We can add a base address register for its table that contains a physical address rather than a virtual address (like P0's and P1's). In this way, even processes can get the potential performance benefit of a linear page table, while still allowing virtualization if the OS memory manager desired. I'm not aware of any architecture that has used such design though.

Your answer makes perfect sense. My confusion is not exactly "why can't all the page tables be in memory". Rather, it's that the _rationale given_ is to "prevent against circular page faults". It seems like the rationale should be more fundamentally "the processor must have some physical address as a starting point", and whatever you call that, it's a physical page table. What does this have to do with "circular page faults"? Seems much more fundamental. Thanks! — allstar, May 15 '18 at 16:18
@allstar The Wikipedia section on virtualized page tables was written in 2004 and never updated after that. I don't know how to contact the person who wrote that part. But I think the person confused virtualized page tables with multi-level page tables. In articular, imagine a processor with registers that keep track of the base virtual address, base physical address, and present bit of the page table. If the present bit is zero, a page fault occurs, but to call the page fault handler, its virtual address must be translated to a physical address... — Hadi Brais, May 15 '18 at 17:27
...But to do that, the page table needs to be accessed, which results again in a page fault, and so on. This is what that person meant. But this is multi-level page table design, not virtualized page table. — Hadi Brais, May 15 '18 at 17:27

score 2 · Accepted Answer · answered May 15 '18 at 17:24

re: your update: those are two different ways of stating the same requirement.

It's a reductio ad absurdum argument: consider a CPU where the page-table pointers were all virtual addresses: the TLB-miss or page-fault handler would need to take another page fault or TLB miss because it doesn't have a physical address it can use directly.

This is the "circular" problem you'd run into, and why there has to be a physical address somewhere or else it's a catch-22 / turtles all the way down type of situation.

score 0 · Answer 3 · answered May 14 '18 at 14:59

But it seems to me like there's an even more basic issue - how you would find the page table in the first place?

That is a problem and processors take different approaches. On approach is to have separate page tables for the system space and the user space. The system space page tables specify physical addresses. The user space tables specify logical addresses. Thus, accessing the user space tables requires a logical to physical translation through the system space tables.

How virtualized page table works

3 Answers3

Linked