What happens to instruction pointers when address overrides are used to target a smaller address space?

Question

What happens to instruction pointers when address overrides are used to target a smaller address space e.g. the default is 32-bit address but the override converts to 16?

So, let's say we're in x86-32 mode and the default is a 32-bit memory space for the current code segment we're in.

Further, the IP register contains the value 87654321h.

If I use 67h to override the default and make the memory space 16-bit for just that one instruction, how does the processor compute the offset into the current code segment?

Some bits in the IP have to be ignored, otherwise you'd be outside the 16-bit memory space specified by the override.

So, does the processor just ignore the 8765 part in the IP register?

That is, does the processor just use the 4 least significant bits and ignore the 4 most significant bits?

What about address overrides associated with access to data segments?

For example, we're in x86-32 mode, the default is 32 bit memory addressing and we use 67h prefix for this instruction: mov eax, [ebx].

Now, ebx contains a 32 bit number.

Does the 67h override change the above instruction to: mov eax, [bx]?

What about "constant pointers"? Example: mov eax, [87654321].

Would the 67h override change it to mov eax, [4321]?

Does the memory override affect the offset into the data segment also or just the code segment?

How do address overrides affect the stack pointer?

If the stack pointer contains a 32 bit number (again we'll use 87654321h) and I push or pop, what memory is referenced?

Pushing and popping indirectly accesses memory.

So, would you only use the 4321 bits in the IP register ignoring the most significant bits?

Also, what about the segment bases themselves?

Example: we're in x86-32 mode, default 32 bit memory space, but we use 67h override.

The CS register points to a descriptor in the GDT whose segment base is, again lol, 87654321h.

We're immediately outside of the 16-bit memory range without even adding an offset.

What does the processor do? Ignore the 4 most significant bits? The same question can be applied to the segment descriptors for the data and stack segments.

address-size prefixes have no effect on fetching future instructions. See the answer to your other question (https://stackoverflow.com/questions/46188388/what-happens-when-you-use-a-memory-override-prefix-but-all-the-operands-are-regi) for a link to Intel's manual that describes prefixes in detail. Edit this question if there's still stuff that you don't understand after reading that. — Peter Cordes, Sep 14 '17 at 05:10
That's like 10 questions. You should try some of this by assembling `db 0x67` / `mov eax, [ebx]` and then disassembling. (And also assembling `mov eax, [bx]`). Also single-step it in a debugger. Then you can edit out the questions already answered directly by Intel's documentation. `EIP` is never truncated by prefixes, except maybe on `jmp` or `call` instructions, I forget. — Peter Cordes, Sep 15 '17 at 01:02

Peter Cordes · Answer 1 · 2017-09-15T06:02:35.000

2

0x67 is the address-size prefix. It changes the interpretation of an addressing mode in the instruction.

It does not put the machine temporarily into 16-bit mode or truncate EIP to 16-bit, or affect any other addresses that don't explicitly come from an [addressing mode] in the instruction.

For push/pop, the instruction reference manual entry for push says:

The address size is used only when referencing a source operand in memory.

So in 32-bit mode, a16 push eax would still set esp-=4 and then store [esp] = eax. It would not truncate ESP to 16 bits. The prefix would have no effect, because the only memory operand is implicit not explicit.

push [ebx] is affected by the 67 prefix, though.

db  0x67
push dword [ebx]

would decode as push dword [bp+di], and load 32 bits from that 16-bit address (ignoring the high 16 of those registers). (16-bit addressing modes use a different encoding than 32/64 (with no optional SIB byte).

However, it would still update the full esp, and store to [esp].

(For the effective-address encoding details, see Intel's volume 2 PDF, Chapter 2: INSTRUCTION FORMAT, table 2-1 (16-bit) vs. table 2-2 (32-bit).)

In 64-bit mode, the address-size prefix would turn push [rbx] into push [ebx]).

Since some forms of push can be affected by the address-size prefix, this might not fall into the category of meaningless prefixes, use of which is reserved and may produce unpredictable behaviour in future CPUs. (What happens when you use a memory override prefix but all the operands are registers?). OTOH, that may only apply to the push r/m32 opcode for push, not for the push r32 short forms that can't take a memory operand.

I think the way it's worded, Intel's manual really doesn't guarantee that even the push r/m32 longer encoding of push ebx wouldn't decode as something different in future CPUs with a 67 prefix.

edited Sep 15 '17 at 06:02

answered Sep 15 '17 at 01:10

Peter Cordes

328,167
45
605
847

67 ff 33 is push dword [bp+di], not push dword [bx] – prl Sep 15 '17 at 01:29
@prl: Thanks, I would normally use 64 -> 32 examples, but the OP was asking about 32->16 and I forgot to account for the different encoding of the ModR/M byte in 16-bit addressing modes :P – Peter Cordes Sep 15 '17 at 01:33
I only remember it because of going the other way: it allows using [eax] in 16-bit mode, which can be convenient. – prl Sep 15 '17 at 01:54
@prl: Yeah, I'm sure I would have noticed if it had been `[eax]`. I remember thinking "ok, yeah, `[bx]` is a valid 16-bit addressing mode, so this looks right". Derp. >. – Peter Cordes Sep 15 '17 at 01:55
Can you explain how the two address schemes are different? What is the 16 bit scheme? And the 32 bit scheme? They seem totally unrelated. Also, how did you guys memorize the machine codes? – matrix Sep 15 '17 at 03:50
@matrix: I assume @ prl just assembled `db 0x67` / `mov eax, [ebx]`, and then used a disassembler on the resulting object file. For the effective-address encoding details, see [Intel's volume 2 PDF](https://software.intel.com/en-us/articles/intel-sdm#three-volume), Chapter 2: INSTRUCTION FORMAT, table 2-1 (16-bit) vs. table 2-2 (32-bit). This is *right* below the stuff about prefixes which prl pointed you to in answer to your other question, so just page down from there. 32/64-bit uses a SIB byte for indexed addressing modes, instead of cramming limited options into the ModR/M byte – Peter Cordes Sep 15 '17 at 04:00
@Peter, No, I did it in my head (with a little help from the tables in the manual). :-) – prl Sep 15 '17 at 04:41

prl · Answer 2 · 2017-09-15T01:47:43.783

1

For example, we're in x86-32 mode, the default is 32 bit memory addressing and we use 67h prefix for this instruction: mov eax, [ebx].
Now, ebx contains a 32 bit number.
Does the 67h override change the above instruction to: mov eax, [bx]?
What about "constant pointers"? Example: mov eax, [87654321].
Would the 67h override change it to mov eax, [4321]?

The address size override doesn't just change the size of the address, it actually changes the addressing scheme.
A 67 override on mov eax, [ebx] changes it to mov eax, [bp+di].
A 67 override on mov eax, [87654321] changes it to mov eax, [di] (followed by and [ebx+65], eax and some xchg instruction).

edited Sep 15 '17 at 01:47

answered Sep 15 '17 at 01:41

prl

11,716
2
13
31

Why does the scheme change so radically? How can `mov eax, [87654321]` change to `mov eax, [di]` (followed by `and [ebx+65], eax` and some xchg instruction). Those two are totally unrelated. – matrix Sep 15 '17 at 03:49
1

@matrix: the 16-bit ModR/M encoding for `[disp16]` is not the same as the 32-bit encoding for `[disp32]`, so the displacement bytes end up decoded as separate instructions instead of decoded as part of the addressing mode. (Length-changing prefixes are fun... Actually they're a potential decode bottleneck for Intel CPUs, in the pre-decode stage that finds instruction boundaries before feeding them to the decoders.) – Peter Cordes Sep 15 '17 at 04:07
OK, I've read the table and see what you guys are talking about. But, why would you want the override the addressing scheme in the first place? Overriding the operands makes sense; I can see the value in it. But what's gained by changing the memory addressing scheme? – matrix Sep 15 '17 at 04:15
1

@matrix: Usually nothing. There's very little use for it in 32-bit mode. In 16-bit mode, it lets you use 32-bit addresses, or use scaled indices, or registers that 16-bit addressing modes can't use. – Peter Cordes Sep 15 '17 at 04:55
1

In 64-bit mode with 32-bit pointers (e.g. Linux's [x32 ABI](https://en.wikipedia.org/wiki/X32_ABI)), compilers sometimes end up using address-size prefixes instead of extra sign/zero extending instructions: https://godbolt.org/g/q7VYJR (Because it has to make sure that addresses wrap around instead of going above 4GB with a 64-bit addressing mode if the index was 32-bit negative number). The other option would be to sign-extend the 32-bit int to a 64-bit register and use a 64-bit addressing mode. (The ABI already requires that pointer args to functions are zero-extended to 64-bit.) – Peter Cordes Sep 15 '17 at 04:57
@matrix: you might think it would be useful with `lea`, for stuff like `lea eax, [edx+ecx]` in 64-bit mode. But no, actually `lea eax, [rdx+rcx]` will always give you exactly the same result, because high bits don't affect low bits in addtion or left-shifts (https://stackoverflow.com/questions/34377711/which-2s-complement-integer-operations-can-be-used-without-zeroing-high-bits-in). Possibly it could be useful in 16-bit mode for `lea eax, [bx+di]` or something, instead of `lea eax, [ebx+edx]` / `movzx eax, ax` – Peter Cordes Sep 15 '17 at 05:04
@Peter 32-bit addresses in real mode? You mean trivial 32-bit addresses, right? zero extended? How can you break the 1MB real mode memory space limit? – matrix Sep 15 '17 at 05:05
2

@matrix: Apparently you can set up segment limits > 64k using protected mode, then switch back to real mode. This is called ["big/huge unreal mode"](https://stackoverflow.com/questions/32807155/accessing-4gb-ram-in-real-mode). https://en.wikipedia.org/wiki/Unreal_mode and http://wiki.osdev.org/Unreal_Mode. Or just use it for 32-bit LEA in 16-bit mode. Or use it in 16-bit protected mode, which I think is a thing. (Real mode isn't the only mode with 16-bit operand-size by default). – Peter Cordes Sep 15 '17 at 05:07
Oops, earlier I meant to write `lea eax, [bx+di]` in **32-bit** mode. That *might* actually be the most efficient way to do that, if it doesn't LCP-stall on Intel CPUs! It's different from 64-bit mode because writing `ax` doesn't zero-extend into `eax`, but writing `eax` does zero-extend into `rax`. – Peter Cordes Sep 15 '17 at 05:11
@Peter OK I perused the article you linked to. It seems that you switch into protected mode, populate a GDT or LDT, then load a segment register with a pointer into the GDT or LDT thereby filling the cached portion of the segment register. Then, switch back to real mode, and the processor treats the cached portion of the segment register as being part of the regular portion of the segment register. So, the segment register becomes a full sized 32 bit register. Correct? – matrix Sep 15 '17 at 05:22
1

@matrix. Right. Switching modes doesn't clear the segment descriptor cache. There's no such thing as a 32-bit segment register. It's either used directly (`seg << 4 + offset` in real mode), or it's an index into a descriptor table. I wonder if that was the main use-case Intel had in mind for address-size prefixes? Seems like a waste of opcode coding space otherwise. Unless they were thinking of 16-bit pointers in 32-bit mode to save memory / cache footprint? Unlikely, because 16-bit addressing modes suck so much. – Peter Cordes Sep 15 '17 at 05:25
@Peter There's no such thing as 32 bit segment register? How big are the segment registers including the hidden cached part? – matrix Sep 15 '17 at 05:31
1

@matrix: The cached descriptor isn't part of the segment *register* itself. `push ss` / `pop ss` doesn't save/restore it, it reloads it based on the index. The descriptor-caching is architecturally visible, and is even accessible directly with [`wrfsbase`](http://felixcloutier.com/x86/WRFSBASE:WRGSBASE.html) in 64-bit mode, to set the base address. (useful for thread-local storage)). But it's not visible through the segment register value itself. There's also [`lsl`](http://felixcloutier.com/x86/LSL.html), but I think that just reads the table, not the cached value. – Peter Cordes Sep 15 '17 at 05:48
You can use 32-bit addressing modes for instruction operands even in ordinary real mode, as long as the effective address is within the segment limit. Otherwise, it will cause #GP or #SS. – prl Sep 15 '17 at 05:49
@Peter, "used directly (seg << 4 + offset) in real mode". Actually, even in real mode, the processor uses the base address in the descriptor cache to compute operand addresses. Whenever the segment register is loaded in real mode, the base address is loaded with seg << 4. – prl Sep 15 '17 at 05:54
@Peter. Yes, 16-bit protected mode is a thing. It is the mode the processor is in after setting CR0.PE (before the far jump to load the CS, which usually immediately follows). It can also be entered from 32-bit protected mode by a far jump to a code segment with the D bit equal to 0. – prl Sep 15 '17 at 05:54
@prl: I think that's what @ matrix meant by "trivial" 32-bit addresses. Certainly it could be useful to use 32-bit addressing modes in 16-bit mode for their other features (like scaled index and full choice of registers). Actually maybe that alone would be enough for Intel to justify using up that potential opcode as a prefix byte, since they expected people to still mostly write 16-bit code, and using an extra prefix could save instructions and maybe even total code bytes. Of course, software using that wouldn't be portable to 286 and earlier. – Peter Cordes Sep 15 '17 at 05:57
I'm confused. How exactly can you get the computer to see an address space greater then 1 MB while in real mode? – matrix Sep 15 '17 at 05:58
As you said earlier: load a segment descriptor in protected mode with whatever base and limit you want and then switch back to real mode. – prl Sep 15 '17 at 06:00
Load a descriptor into a segment register? So, the segment register will hold the offset within the GDT where the descriptor lays. And the base and limit will automatically, by the hardware, be loaded into the cached portion of the segment register, right? Then switch into real mode? and how does that extend your memory space beyond 1 mb? – matrix Sep 15 '17 at 06:07
Yes, that's right. (When I said "load a segment descriptor", that was shorthand for "load a segment register with a selector that points to a descriptor" with the specified characteristics.) So then the segment base is a 32-bit address, and the limit can also be up to 32 bits, allowing you to address up to 4GB. – prl Sep 15 '17 at 06:17
1

OK. So, when back into real mode with the segment registers loaded this way from protected mode, the system reads the base of the segment from the _cached_ part, yes? But, you can't change the segment register from then on. Because if you do, the base address stored in the cached part will change to equal what was just placed into the visible part of the segment register times 16. And because the visible part is 16 bits, the new base will be 20 bits thereby destroying your >1MB memory range. Yes? – matrix Sep 15 '17 at 06:26
Yes. And the segment limit is loaded with 0ffffh also when you load a segment register in real mode. – prl Sep 15 '17 at 06:33
Right. 64KB. Cool. Thanks. I think I have it. – matrix Sep 15 '17 at 06:36

What happens to instruction pointers when address overrides are used to target a smaller address space?

2 Answers2