I've been always thinking that if linear address translation process encounters TLB miss then it traverses page directory structure in memory. However Intel Manual Vol.3/4.10.3 defines the so called Paging-Structure Caches which I've not heard about before.
This is what it's done for TLB miss:
If the processor does not find a relevant TLB entry or PDE-cache entry, it may use the upper bits of the linear address (for 4-level paging, bits 47:30; for 5-level paging, bits 56:30) to select an entry from the PDPTE cache that is associated with the current PCID. It can then use that entry to complete the translation process (locating a PDE, etc.) as if it had traversed the PDPTE, the PML4E, and (for 5-level paging) the PML5E corresponding to the PDPTE-cache entry
and
If the processor does not find a relevant TLB entry, PDE-cache entry, or PDPTE-cache entry, it may use the upper bits of the linear address (for 4-level paging, bits 47:39; for 5-level paging, bits 56:39) to select an entry from the PML4E cache that is associated with the current PCID. It can then use that entry to complete the translation process (locating a PDPTE, etc.) as if it had traversed the corresponding PML4E.
So TLB miss does not necessarily means traversing the whole page structure.
Could you give some examples of perf events describing the Page-Structure Caches access and how to optimize for Page-Structure Cache usage?