2

During the fetch phase of the instruction cycle in an x86 CPU, I've wondered if the eip(PC) register gets incremented to store the next instruction at the end of that phase(fetch phase) or after the execution phase?

I know that MIPS CPUs increment eip by the end of the fetch phase, but x86 CPUs are also doing it?

I assume it does because after I looked at a compiled code of some program, I've noticed that the address in the encoding of a "relative call instruction" is relative to the next instruction and not to the current instruction.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
AngryJohn
  • 576
  • 4
  • 10
  • 3
    MIPS instructions are fixed-length, which makes it possible to do this increment, as we always know by how much (4 bytes). x86 uses variable length instructions, so incrementing the pc to the next pc requires understanding how long the instruction is, which means decode (or lookup in some cache). Experts here will be able to shed more light on this. RISC V supports variable length instructions (in multiples of 2 bytes) and the length of the complete instruction is encoded in the first few bits of each instruction, so some decode is not necessary but not a complete decode of the instruction. – Erik Eidt Mar 14 '22 at 22:34
  • 1
    Modern x86 processors decode up to 4 instructions per clock. There isn't an IP register as such, because each instruction has a different address. The IP of the start of each instruction is associated with the uops of the instruction. The IP of each instruction is needed for relative branches, RIP relative-memory accesses, calls, and fault/interrupt reporting. – prl Mar 14 '22 at 23:44
  • 2
    The point at which eip is incremented is not architecturally observable. (It might move forward, and then move back if an exception occurs. Or it might not move forward until the instruction is retired. You can't tell.) The fact that the call offset is relative to the next instruction doesn't prove that eip has been advanced, any more than the fact that pc-relative loads on ARM add 4 proves that pc has moved ahead by 4. (It may have been that way on early implementations, but newer implementations don't have to do it, as long as they still calculate relative offsets the same way.) – Raymond Chen Mar 15 '22 at 01:08

1 Answers1

0

"fetch phase?" What kind of chip you got in there, a Dorito? e.g. a 386? Even 486 was pipelined and P5 Pentium was dual-issue superscalar. So 386 was the only non-pipelined x86 with an EIP, not just an IP (at least from Intel). Of course, all commercial MIPS CPUs were pipelined as well, that was literally the whole point of the RISC ISA design and name (Microprocessor without Interlocked Pipelines Stages).

x86 machine code is a byte-stream of variable-length x86 instructions, so you definitely can't know the end of an instruction until after decoding it.

For pipelined fetch/decode, x86 CPUs have to just fetch a stream of blocks and decode a window from a fetch buffer. So the fetch address increments in the fetch stage (not phase), in parallel with decode and later stage(s) working on the results of previous fetches. (Modern x86 CPUs have up to 4-wide legacy decode (e.g. in Zen 2, or Skylake decoding 4 instructions per clock into up-to-5 uops, up from 4 insns -> 4 uops in Sandybridge). Perhaps even wider in Alder Lake. Usually they depend on the uop cache of already-decoded instructions to feed pipelines that are 5 or 6 uops wide; legacy decode is too hard to scale up)


As part of decode, any x86 CPU takes note of the end address (of each instruction decoded in parallel), because that's what relative jumps/calls are relative to, and same for x86-64 RIP-relative addressing modes. It's also the return address call has to push.

The start address is only needed for some kinds of exceptions, where the address of the faulting instruction is pushed. (So the OS can repair the situations, e.g. for a #PF page fault, and return to user-space to re-run the instruction and hopefully have it succeed.) But given speculative execution, a modern x86 does have to also note the start address of every instruction and track it throughout the pipeline, along with the end. (Or a start+length or end-length, since the length is at most 4 bits instead of 64 bits.)

Even original 8086 had pipelined prefetch separate from decode, but yes, decode would increment IP as it decoded, so it had the end (but not the start) of the instruction.

8086 did not remember the start address of the instruction at all during decode (which could iterate over an arbitrary number of prefixes (the 15-byte max insn length limit wasn't instituted until later). It didn't have many of the exceptions that modern x86 has (not even a #UD illegal-instruction trap: every byte-sequence executed as something.)

Even 8086 #DE divide exception pushed the final address, unlike later x86. (And even handling interrupts during interruptible instructions like rep cs movsb only pushed the address of the last prefix, not the first, so it would resume as cs movsb! Later x86 CPUs fixed that design flaw along with changing #DE semantics.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847