x86 registers: MBR/MDR and instruction registers

Question

From what I have read, the IA-32 architecture has ten 32-bit and six 16-bit registers.

The 32-bit registers are as follows:

Data registers - EAX, EBX, ECX, EDX
Pointer registers - EIP, ESP, EBP
Index registers - ESI, EDI
Control registers - EFLAG (EIP is also classified as a control register)

The 16-bit registers are as below:

Code Segment: It contains all the instructions to be executed.
Data Segment: It contains data, constants and work areas.
Stack Segment: It contains data and return addresses of procedures or subroutines.
Extra Segment (ES). Pointer to extra data.
F Segment (FS). Pointer to more extra data.
G Segment (GS). Pointer to still more extra data.

However, I can't find any information on the Current Instruction Register (CIR) or Memory Buffer Registers (MBR)/Memory Data Registers (MBR). Are these registers referred to as something else? And are these registers 32-bit?

I assume they are 32-bit and that most commonly used instructions under this architecture are under 4 bytes long. From observation, many instructions seem to be under 4 bytes, for example:

PUSH EBP (55)
MOV EBP, ESP (8B EC)
LEA (8D 44 38 02)

For longer instruction, the CPU will use prefix codes and other optional codes. Longer instructions will require more than one cycle to complete which will depend on instruction length.

Am I correct in that the registers in question are 32-bit in length? And are there any other registers in the IA-32 architecture that I should also be aware of?

Peter Cordes · Accepted Answer · 2018-07-26T00:33:07.770

No, the registers you're talking about are an implementation detail that don't exist as physical registers in modern x86 CPUs.

x86 doesn't specify any of those implementation details you find in toy / teaching CPU designs. The x86 manuals only specify things that are architecturally visible.

Intel and AMD's optimization manuals go into some detail about the internal implementation, and it's nothing like what you're suggesting. Modern x86 CPUs rename the architectural registers onto much larger physical register files, enabling out-of-order execution without stalling for write-after-write or write-after-read data hazards. (See Why does mulss take only 3 cycles on Haswell, different from Agner's instruction tables? for more details about register renaming). See this answer for a basic intro to out-of-order exec, and a block diagram of an actual Haswell core. (And remember that a physical chip has multiple cores).

Unlike a simple or toy microarchitecture, almost all high-performance CPUs support miss under miss and/or hit under miss (multiple outstanding cache misses, not totally blocking memory operations waiting for the first one to complete)

You could build a simple x86 that had a single MBR / MDR; I wouldn't be surprised if original 8086 and maybe 386 microarchitectures had something like that as part of the internal implementation.

But for example a Haswell or Skylake core can do 2 loads and 1 store per cycle from/to L1d cache (See How can cache be that fast?). Obviously they can't have just one MBR. Instead, Haswell has 72 load-buffer entries and 42 store-buffer entries, which all together are part of the Memory Order Buffer which supports out-of-order execution of loads / stores while maintaining the illusion that only StoreLoad reordering happens / is visible to other cores.

Since P5 Pentium, naturally-aligned loads/stores up to 64 bits are guaranteed atomic, but before that only 32-bit accesses were atomic. So yes, if 386/486 had an MDR, it could have been 32 bits. But even those early CPUs could have cache between the CPU and RAM.

We know that Haswell and later have a 256-bit path between L1d cache and execution units, i.e. 32 bytes, and Skylake-AVX512 has 64-byte paths for ZMM loads/stores. AMD CPUs split wide vector ops into 128-bit chunks, so their load/store buffer entries are presumably only 16 bytes wide.

Intel CPUs at least merge adjacent stores to the same cache line within the store buffer, and there are also the 10 LFBs (line-fill buffers) for pending transfers between L1d and L2 (or off-core to L3 or DRAM).

Instruction decoding: x86 is variable-length

x86 is a variable-length instruction set; after prefixes, the longest instruction is longer than 32 bits. This was true even for 8086. For example, add word [bx+disp16], imm16 is 6 bytes long. But 8088 only had a 4-byte prefetch queue to decode from (vs. 8086's 6 byte queue), so it had to support decoding instructions without having loaded the whole thing from memory. 8088 / 8086 decoded prefixes 1 cycle at a time, and 4 bytes of opcode + modRM is definitely enough to identify the length of the rest of the instruction, so it could decode it and then fetch the disp16 and/or imm16 if they weren't fetched yet. Modern x86 can have much longer instructions, especially with SSSE3 / SSE4 requiring many mandatory prefixes as part of the opcode.

It's also a CISC ISA, so keeping around the actual instruction bytes internally isn't very useful; you can't use the instruction bits directly as internal control signals the way you can with a simple MIPS.

In a non-pipelined CPU, yes there might be a single physical EIP register somewhere. For modern CPUs, each instruction has an EIP associated with it, but many are in flight at once inside the CPU. An in-order pipelined CPU might associate an EIP with each stage, but an out-of-order CPU would have to track it on a per-instruction basis. (Actually per uop, because complex instructions decode to more than 1 internal uop.)

Modern x86 fetches and decodes in blocks of 16 or 32 bytes, decoding up to 5 or 6 instructions per clock cycle and placing the decode results in a queue for the front-end to issue into the out-of-order part of the core.

See also the CPU-internals links in https://stackoverflow.com/tags/x86/info, especially David Kanter's write-ups and Agner Fog's microarch guides.

BTW, you left out x86's many control / debug registers. CR0..4 are critical for 386 to enable protected mode, paging, and various other stuff. You could use a CPU in real mode only using the GP and segment regs, and EFLAGS, but x86 has far more architectural registers if you include the non-general-purpose regs that the OS needs to manage.

First of all, you are my guru I have deep respect for you ;). I wanted to know Is Program Counter / Instruction Register / Location Counter / Memory Buffer Register / Memory Data Register same register ? and they are not present in modern CPU ? — Ahtisham, Mar 10 '20 at 16:07
@Ahtisham: Of course they're not all the same register. PC is a pointer and IR (if it exists) is the instruction value it points to. MBR/MDR (if they exist) are both different and are used for data load/store as well (again in a simple design without split L1 caches where instruction and data access don't happen in parallel). Modern x86 CPUs need to know the address of every in-flight instruction (in case it faults) so there isn't one *single* PC register, just data associated with a uop. x86 never uses an "instruction register" because instructions aren't fixed length and need decoding. — Peter Cordes, Mar 10 '20 at 17:36
Reading this answer as a web devloper of 9 years feels like what it felt like to look at JavaScript for the first time as a 9th grade high-school student who'd only ever heard of code as a concept. It sounds so advanced and I'm trying to move into the security research space, learning assembly and about x86 but looking at your level of knowledge reminds me that it'll be another 9 years before I can look back at this answer and comprehend it as I would easily read the source code of a JS library today after 9 years of learning the subject. — J.Todd, Jun 23 '21 at 17:06
@J.Todd: It used to be that you didn't really need to understand how CPUs worked internally for security, just the ISA on-paper model of how instructions execute (unless you're trying to exploit multi-threaded code with race bugs or insufficient memory-order). But now with Spectre, and especially [MDS](https://en.wikipedia.org/wiki/Microarchitectural_Data_Sampling) vulns like L1TF, not to mention Meltdown, suddenly CPU-architecture is relevant for security. :/ I'm interested in it for performance, primarily (and just because it's fun). — Peter Cordes, Jun 23 '21 at 17:30
@PeterCordes I'm sure you're aware (because of course you'd have been interested in the Movfuscator) but a treat just in case you aren't: the 3(4?) presentations by Chris Domas at DEFCON using incredibly clever x86 CPU reverse engineering to find [embedded RISC micro-architecture backdoors in x86 CPUs](https://www.youtube.com/watch?v=jmTwlEh8L7g), [find proprietary x86 instructions (password protected MSRs)](https://youtube.com/watch?v=XH0F9r0siTI), and even [(using page fault analysis to fuzz) find malformed instructions that halt entire processors](https://youtube.com/watch?v=ajccZ7LdvoQ) — J.Todd, Jun 23 '21 at 19:11
@J.Todd: yes, thanks. (Some of those have come up on SO before in comments, which is how I first became aware of them, e.g. Margaret Bloom and harold linked on [How to tell length of an x86-64 instruction opcode using CPU itself?](https://stackoverflow.com/a/51546463). I also updated my answer on [Why does Intel hide internal RISC core in their processors?](https://stackoverflow.com/a/32866797) a couple months ago with my take on that Via feature and vulnerability. But I don't think I'd seen the password-protected one, or the lockup. — Peter Cordes, Jun 23 '21 at 19:34

x86 registers: MBR/MDR and instruction registers

1 Answers1

Instruction decoding: x86 is variable-length

Linked

Related