The VAX architecture from the 70s used the virtualized linear page table approach to implement paging. VAX partitions the 32-bit virtual address space into four ranges:
- P0: 0x00000000 - 0x3fffffff.
- P1: 0x40000000 - 0x7fffffff.
- S0: 0x80000000 - 0xbfffffff.
- S1: 0xc0000000 - 0xffffffff.
P0 (called the program region) and P1 (called the control region) are user-specific partitions, S0 is the system (kernel) partition, and S1 is reserved. So each process has its own set of mappings for P0 and P1, but all processes and the kernel share the same mappings for S0.
Note that the most two significant bits of a virtual address are used to determine which section of the virtual memory to access. Each section (except for S1, which is not usable) is defined by a page table. In particular, P0 and P1 are defined by a virtualized page table (the page tables are mapped to virtual memory), but S0's page table is not virtualized. Each page table is a contiguous array of 4-byte page table entries. Each page table entry is either invalid or valid (which means it contains the physical address of a 512-byte page).
VAX provides 6 registers to define the page tables: a page table base address register and a page table length register for each of the three sections of the virtual address space P0, P1, and S0. The base address registers of P0 and P1 contain virtual addresses whose two most significant bits are 10. That is, S0's page table contains the page table entries that contain the physical addresses of P0 and P1. This allows the page tables of P0 and P1 of any process to be resident in main memory or in secondary storage. On the other hand, the base address register of S0 contains the physical address of S0's page table.
So essentially, the page table of a process is divided into three contiguous page tables, two of which are virtualized and one is always resident in memory. From Wikipedia:
It was mentioned that creating a page table structure that contained
mappings for every virtual page in the virtual address space could end
up being wasteful. But, we can get around the excessive space concerns
by putting the page table in virtual memory, and letting the virtual
memory system manage the memory for the page table.
However, part of this linear page table structure must always stay
resident in physical memory, in order to prevent against circular page
faults, that look for a key part of the page table that is not present
in the page table, which is not present in the page table, etc.
S0's page table is the part of the linear page table that must always reside in memory (i.e., not virtualized). But why does it have to be like that? What happens if S0's base address register contained a virtual address rather than a physical address of the page table? But in that case, how can the processor figure out the physical base address of the page table? We would need some additional data structure with a known physical address that enables us to figure out the physical address of the page table. Let's for the sake of argument assume that we have such a data structure that is stored somewhere. Is it possible for the page table to be fully swapped out to secondary storage? Yea we can do that if we have something like a "present bit" or "valid bit" in that data structure. However, the present bit was set to false, a page fault occurs when accessing memory at any virtual address. The OS now needs to handle the page fault an if it requires to access any virtual address, it will page fault again, and so on.
Otherwise, in general, if the page fault handler is designed to use only physical addresses (by turning off paging) that point to data and code that are always present, then effectively you can get around virtualizing the whole page table. But this would complicate the design of the handler considerably.
Partitioning the page table into more than one contiguous array like how it's done in VAX means that some part of the page table (S0's) must be present at all times.
But if S0's page table contains entries to find P0's and P1's page tables, then isn't that also a multi-level page table effectively? To answer this question, let's compare how address translation is done in VAX and 32-bit x86.
In VAX translation, the virtual page number is the same as the page table index.
|31|29 9|8 0|
------------------------------------
| | virtual page number | offset |
------------------------------------
| | page table index | offset |
------------------------------------
In 32-bit x86 translation (with PAE and PSE disabled), the virtual page number is partitioned into two indices for the two-level page table.
|31 12|11 0|
------------------------------------
| virtual page number | offset |
------------------------------------
| PT 1 index|PT 2 index| offset |
------------------------------------
In VAX, only accesses to the user page tables require two-level lookups. More importantly, the two lookups are performed using two different virtual addresses. On the other hand, accesses to the system page table require only a single lookup using a single virtual address. In contrast, in x86, all accesses require two-level lookups using the same virtual address.
The x86 architecture supports virtualized multi-level page tables.
We can design a hybrid page table that is potentially more powerful than both. If we use the S1 partition as a third user partition. We can add a base address register for its table that contains a physical address rather than a virtual address (like P0's and P1's). In this way, even processes can get the potential performance benefit of a linear page table, while still allowing virtualization if the OS memory manager desired. I'm not aware of any architecture that has used such design though.