Why segmentation cannot be completely disable?

Question

According to AMD manual segmentation can not be disabled. My question is why, why it's impossible? Another question, it says that 64-bit disables it, what does that mean? Is segmentation completly disabled on 64-bit mode?

AMD Manual: https://s7.postimg.cc/hk15o6swr/Capture.png

Segmentation is still used as the mechanism for an x86-64 CPU to know whether to run in 32-bit mode or 64-bit mode. (The `L` bit in the segment descriptor that you set `CS` to. https://wiki.osdev.org/Global_Descriptor_Table#x86-64_Changes) So x86-64 switches between long mode and compat mode with a `jmp far` to a new code segment, or with `iret` or other things that change CS:RIP, not just RIP. Instead of inventing a new mechanism for that, they just used the existing segment stuff because the CPU still has to support it for legacy mode. — Peter Cordes, Apr 13 '18 at 07:50

Hadi Brais · Accepted Answer · 2019-07-26T21:52:43.327

Introduction

In 64-bit mode, whenever a non-null segment selector is loaded into any of the segment registers, the processor automatically loads the corresponding segment descriptor in the hidden part of the segment register, just like in protected/compatibility mode. However, the segment descriptors selected by the DS, ES, or SS selectors are completely ignored. Also the limit and attribute fields of the segment descriptors selected by the FS and GS selectors are ignored.

Intel Manual V3 3.4.4:

Because ES, DS, and SS segment registers are not used in 64-bit mode, their fields (base, limit, and attribute) in segment descriptor registers are ignored. Some forms of segment load instructions are also invalid (for example, LDS, POP ES). Address calculations that reference the ES, DS, or SS segments are treated as if the segment base is zero.

...

In 64-bit mode, memory accesses using FS-segment and GS-segment overrides are not checked for a runtime limit nor subjected to attribute-checking.

Other than that, it's assumed that the base address of each of these segments to be 0 and the length to be 2⁶⁴. However, some parts of the segment descriptors selected by the CS, FS, or GS selectors still take effect. In particular, the base addresses of FS and GS specified in their respective descriptors are used.

Intel Manual V3 3.4.4:

When FS and GS segment overrides are used in 64-bit mode, their respective base addresses are used in the linear address calculation.

In addition, the following fields of the CS descriptor are used: D (default bit), L (64-bit sub-mode bit), AVL (OS bits), P (present bit), DPL (privilege level bits), S (system bit), D/C (data/code bit), and C (conforming bit). Note that the base address of CS is fixed at 0 and the lengths of CS, FS, and GS are all fixed at 2⁶⁴. As Peter indicated in his comment, the L and D bits of the CS descriptor are required to be able to switch between the different sub-modes of the long mode. The other active fields of CS are also useful. Supporting different base addresses for FS and GS is useful for things like thread-local storage.

Intel Manual V3 5.2.1:

Code segments continue to exist in 64-bit mode even though, for address calculations, the segment base is treated as zero. Some code-segment (CS) descriptor content (the base address and limit fields) is ignored; the remaining fields function normally (except for the readable bit in the type field).

Code segment descriptors and selectors are needed in IA-32e mode to establish the processor’s operating mode and execution privilege-level.

I think that both the readable bit and accessed bit are ignored in 64-bit mode. These attributes are replaced by the corresponding attributes in the paging structures. Although I couldn't find anywhere in the Intel manual that says that the accessed bit is ignored. But the AMD manual does state that clearly.

Descriptor table limit checks are still performed.

Intel Manual V3 5.3.1:

In 64-bit mode, the processor does not perform runtime limit checking on code or data segments. However, the processor does check descriptor-table limits.

So you could say that segmentation is completely disabled for the DS, ES, and SS segments. But not exactly for the other three segments. That's what segmentation cannot be completely disabled means.

Intel Manual V2 Says Otherwise

I quote from the description of the POP instruction.

64-Bit Mode Exceptions

#GP(0) If the memory address is in a non-canonical form.
#SS(0) If the stack address is in a non-canonical form.
#GP(selector) If the descriptor is outside the descriptor table limit.
If the FS or GS register is being loaded and the segment pointed to is not a data or readable code segment.
If the FS or GS register is being loaded and the segment pointed to is a data or nonconforming code segment, but both the RPL and the CPL are greater than the DPL.
#AC(0) If an unaligned memory reference is made while alignment checking is enabled.
#PF(fault-code) If a page fault occurs.
#NP If the FS or GS register is being loaded and the segment pointed to is marked not present.
#UD If the LOCK prefix is used.

Note that POPs to DS, ES, SS are not valid in 64-bit mode, and there is no POP CS. That's why it only talks about FS and GS. Although this implies that the attributes of the descriptors selected by FS and GS are not completely ignored.

Similarly, the description of the MOV instruction says:

64-Bit Mode Exceptions

#GP(0)
If the memory address is in a non-canonical form.
If an attempt is made to load SS register with NULL segment selector when CPL = 3.
If an attempt is made to load SS register with NULL segment selector when CPL < 3 and CPL ≠ RPL.
#GP(selector)
If segment selector index is outside descriptor table limits. If the memory access to the descriptor table is non-canonical.
If the SS register is being loaded and the segment selector's RPL and the segment descriptor’s DPL are not equal to the CPL.
If the SS register is being loaded and the segment pointed to is a nonwritable data segment.
If the DS, ES, FS, or GS register is being loaded and the segment pointed to is not a data or readable code segment.
If the DS, ES, FS, or GS register is being loaded and the segment pointed to is a data or nonconforming code segment, but both the RPL and the CPL are greater than the DPL.
#SS(0) If the stack address is in a non-canonical form.
#SS(selector) If the SS register is being loaded and the segment pointed to is marked not present.
#PF(fault-code) If a page fault occurs.
#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
#UD If attempt is made to load the CS register. If the LOCK prefix is used.

But notice that #NP does not occur here! This suggests that the present bit (P) is only checked for FS, GS, CS, and SS, but not for DS and ES. (But I think that the P bit is checked for all of the segments.) These quotes also suggest that the RPL part of the selector of any segment register is also used.

Null Segment Selector

The null segment selector is a selector whose value is 0x0000, 0x0001, 0x0002, or 0x0003. To the processor, all of these values always have the same effect. These all select the same descriptor, entry 0 of GDT.

The null segment selector cannot be loaded into CS in any mode that uses segmentation (including 64-bit mode) because CS must contain an actual selector at all times. An attempt to do that generates a GP exception.

The null segment selector can be loaded into SS in 64-bit mode (in contrast to other modes), but only in certain situations. For more information, refer to the part "General Protection Exception (#GP)" of Intel Manual V3 6.15.

The null segment selector can be loaded into DS, ES, GS, and FS.

Intel Manual V3 5.4.1.1:

In 64-bit mode, the processor does not perform runtime checking on NULL segment selectors. The processor does not cause a #GP fault when an attempt is made to access memory where the referenced segment register has a NULL segment selector.

I find this very interesting as I will explain later. (I also find it weird that Chapter 3, which is dedicated for segmentation, does not state that).

It's not perfectly clear to me whether the processor loads the null descriptor from memory into the invisible part of the segment register when loading it with the null selector.

Intel Manual V3 3.4.2:

The first entry of the GDT is not used by the processor.

Does this mean that the processor will not load the null descriptor? Or perhaps it only means that the contents of the descriptor are not used. Later it says in 3.4.4:

In order to set up compatibility mode for an application, segment-load instructions (MOV to Sreg, POP Sreg) work normally in 64-bit mode. An entry is read from the system descriptor table (GDT or LDT) and is loaded in the hidden portion of the segment register. The descriptor-register base, limit, and attribute fields are all loaded. However, the contents of the data and stack segment selector and the descriptor registers are ignored.

The description of the POP instruction from the Intel Manual V2 says:

64-BIT_MODE

IF FS, or GS is loaded with a NULL selector;
THEN
SegmentRegister ← segment selector;
SegmentRegister ← segment descriptor;
FI;

The description of the MOV instruction from the Intel Manual V2 says:

IF DS, ES, FS, or GS is loaded with NULL selector
THEN
SegmentRegister ← segment selector;
SegmentRegister ← segment descriptor;
FI;

This suggests that the null descriptor does actually get loaded, but its contents are ignored. The Linux kernel defines the null descriptor to have all bits zero. I've read in many articles and textbooks that this is mandatory. However, Collins says that this is not necessary:

The first entry in the Global Descriptor Table (GDT) is called the null descriptor. The NULL descriptor is unique to the GDT, as it has a TI=0, and INDEX=0. Most printed documentation states that this descriptor table entry must be 0. Even Intel is somewhat ambiguous on this subject, never saying what it CAN'T be used for. Intel does state that the 0'th descriptor table entry is never referenced by the processor.

AFAIK, Intel does not impose any restrictions on the contents of the null descriptor. So I guess Collins is right.

Why is 5.4.1.1 interesting?

Because this means it's possible to use DS, ES, GS, and GS to hold any of the constants 0x0000, 0x0001, 0x0002, or 0x0003, in 64-bit mode. It's guaranteed that the GDT contains at least the null descriptor, so descriptor table limit check will pass (this may not be true with other selectors). In addition, all references to any of these segments will still be performed successfully. The MOV instruction can be used to move a value from a segment register to a GPR and then performing an operation on it.

AMD Manual

To be written.

Is it possible to make an invalid or read-only segment description? What if `ds` refers to that? Or can you truly use `ds` as a (slow) 16-bit scratch register for arbitrary values in long mode? — Peter Cordes, Apr 15 '18 at 09:25
@PeterCordes You can use DS, ES, or SS segment registers as scratch registers in 64-bit, but there are certain restrictions that would make exploiting that very difficult. First, every time a value is loaded into any of these registers, the CPU still accesses the selected 8-byte descriptor and loads it into the invisible part of the segment register. The contents of the descriptor are still ignored though. This is required to support mode switching. This adds perf overhead... — Hadi Brais, Apr 15 '18 at 17:27
...Second, the selectors of DS, ES, or SS *must* still select a descriptor with a valid Present bit (P=1) *or* the null segment descriptor (index 0 in GDT)(the null segment descriptor has P set to 0, how cool is that?). Otherwise, the segment-not-present exception #NP is generated. Third, the selected descriptor must be within the limit of the GDT or LDT (descriptor table limit checks are still performed). Otherwise, #GP gets thrown right at your face... — Hadi Brais, Apr 15 '18 at 17:27
...Any of the descriptor fields not used in 64-bit mode (including base and limit of *all* descriptors, except the null segment descriptor *I think*) are literally ignored, which means you store in those fields whatever you want (until you switch to a different mode of course). — Hadi Brais, Apr 15 '18 at 17:28
There is certainly a [16-bit protected compatibility sub-mode](https://en.wikipedia.org/wiki/X86-64#Operating_modes) under long mode. — Hadi Brais, Apr 15 '18 at 17:30
I might have missed other side-effects or restrictions on using DS, ES, and SS as scratch registers. Unfortunately, the Intel manual is not very precise. So it's difficult to be sure. Maybe @MichaelPetch knows more. — Hadi Brais, Apr 15 '18 at 17:36
Ah, I forgot 16-bit protected/compat mode existed. Interesting, I wonder if Linux supports it at all (for testing 16-bit code-golf functions or whatever). The vast majority of 16-bit code wants 16-bit real mode (or vm86), which is what long mode dropped. — Peter Cordes, Apr 15 '18 at 19:23
@PeterCordes See Intel manual Volume 3 Section 3.4.4 for partial information on how segmentation works in 64-bit mode. AMD has an active [patent](https://patents.google.com/patent/US6880068B1/en) specifically on how should segmentation work in 64-bit mode. But what's written in there is inconsistent with what's written in the Intel manual, which makes me wonder what's written in the current AMD manual. It would be a lot of fun to know that it is inconsistent with both. lol. — Hadi Brais, Apr 15 '18 at 19:28
re: DS/ES/SS: Your first sentence should maybe say "... have to be Present (P=1), but the other fields are ignored". "Completely ignored" is too strong a description. — Peter Cordes, Apr 15 '18 at 19:32
@PeterCordes A few months ago, I did some research on whether and how Windows, Linux, and macOS support 16-bit protected or real mode. I found a lot of incorrect information and nonsensical discussions on this, which made me depressed for a couple of day. — Hadi Brais, Apr 15 '18 at 19:35
@PeterCordes By the way, using the segment registers as additional registers by the compiler's register allocator might actually be good idea. I've not seen anything like that before. I also tried to look up some research papers that propose or study this idea, but didn't find any. I think that would be an interesting project. — Hadi Brais, Apr 15 '18 at 19:38
Using them as extra scratch regs seems unlikely to be good on modern x86, outside of very special cases (like cache disabled). Otherwise, L1d cache is *fast*, and most instructions support memory source operands, so keeping a few read-mostly values in stack memory is better than re-reading from segment regs when needed. Reading segment regs is cheap, though, so it is plausible to use them for read-mostly values even though it takes an ALU uop to copy the value to a GP reg. (e.g. Conroe/Merom takes 1 uop with 1/clock throughput for `mov eax,ds`. Agner says it runs on p2 (load), though?) — Peter Cordes, Apr 15 '18 at 19:57
Tested on SKL: `mov ecx, ds` (`8c d9`, no operand-size prefix) is 2 uops: 1 for port1, one for p0156. — Peter Cordes, Apr 15 '18 at 20:15
@PeterCordes by p0156 you mean any of p0, p1, p5, p6? But what are those 2 uops? Do they get issued together in the same cycle to p0 and p1? — Hadi Brais, Apr 15 '18 at 20:23
Yes, I mean the other uop can run on any ALU port. I haven't measured the latency, but more likely they're dependent. (Latency measurement is hard because I can't think of how to form a dependency chain that involves `mov` from a segment reg without also including `mov` *to* a segment reg, so I wouldn't be able to separate the total round-trip latency into to/from. Maybe in 32-bit mode, or using `fs`, I could measure the mov-to-Sr latency with that and a load? The other option would be after a pipeline stall, to defeat OoO exec. Or maybe a serializing `lfence`? Not easy.) — Peter Cordes, Apr 15 '18 at 20:31