0

Afaik, in x86 all that operating system done is to set CR3 register and let MMU do the left to transform linear addr to physical addr. But since this work is done by MMU, then the MMU should know the rule that how multi-level page table is divided. However I've never seen any documents talking about such detail. Is there anything that I misunderstood?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
choxsword
  • 3,187
  • 18
  • 44

1 Answers1

3

The page-table format is part of the ISA that hardware (including the page-walker(s) built-in to each CPU core) implements, and which software should therefore follow.

So page-walk hardware in an x86 CPU is just built to chop up addresses the x86 way. You can look at the ISA documentation describing the page table format as documenting what the hardware will be looking for.

When there's a choice in page-table formats (like PAE 9 bits per level or legacy 32-bit 10 bits per level, or x86-64 PML5 5-level page tables for 57 virtual bits vs. standard PML4), the choice is set by a control register.

"The MMU" isn't really a separate thing in x86 (or other modern CPUs); it's part of a CPU core and can be affected by control-register bits. If it was a fully separate chip, you might set a top-level page directory with an out instruction or a store to a special MMIO address, instead of mov to cr3.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • regarding this answer it seems that page table format is mainly decided by concrete cpu architecture. But here is an answer that says [_'In Linux, the kernel maintains a three-level page table (regardless of the CPU’s capabilities)"_](https://unix.stackexchange.com/questions/364591/how-does-the-cpu-knows-which-physical-address-is-mapped-to-which-virtual-address/364601#364601), indicating that the OS could decide how many level to use regardless of cpu architecture. – choxsword Feb 17 '21 at 04:05
  • and also here's an answer indicates [same cpu architecture of different kernel version has different behavior in page table controlling.](https://unix.stackexchange.com/questions/379230/how-many-page-table-levels-does-linux-kernel-use-4-or-5). So what confused me is that how many parts of role does the kernel plays in page table level. Is kernel just setting some registers or doing something more than just setting registers, like invoking some kernel callbacks in the process of address translating? – choxsword Feb 17 '21 at 04:09
  • @scottxiao: That's first answer (about "three level" page tables) might be talking about how the kernel keeps track of "logical" mappings, separate from the data structures that hardware looks at. (Note that `/proc//maps` doesn't change when a page is evicted and swapped out.) Or else it's talking about an array of 2-level (legacy x86?) hardware page tables, one for each process. I don't know a lot about how exactly the kernel keeps track of its data structures, but I am 100% sure about what / how it tells the hardware to do in the end. – Peter Cordes Feb 17 '21 at 04:13
  • @scottxiao: The kernel only sets registers; x86 page walking is fully hardware and invisible to software. [What happens after a L2 TLB miss?](https://stackoverflow.com/q/32256250) (Unlike on MIPS, where a TLB miss calls back into an OS provided TLB-miss handler). Also related: [Why in x86-64 the virtual address are 4 bits shorter than physical (48 bits vs. 52 long)?](https://stackoverflow.com/q/46509152) shows the x86-64 page table format, and https://wiki.osdev.org/Paging shows how to set it up (including for legacy x86 2-level). – Peter Cordes Feb 17 '21 at 04:15
  • For architectures like **x86-64 PML5 5-level page tables** as you memtioned, do you mean that the kernel must provide a 5-level table in such architecture? – choxsword Feb 17 '21 at 04:25
  • @scottxiao: yes, if a kernel chooses to set that control-register bit, then it must `mov cr3, address_of_pml5` with the physical address of the top of a 5-level page directory. PML5 is an optional feature that hardware provides but which OSes do *not* have to use, so unless you have boatloads of RAM (or for some other reason want boatloads of virtual address space), there's no reason to use it: it would just make page walks slightly slower, and be more levels of stuff for the kernel to manage. – Peter Cordes Feb 17 '21 at 04:34