71

%AX = (%AH + %AL)

So why not %EAX = (%SOME_REGISTER + %AX) for some register %SOME_REGISTER?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Sargun Dhillon
  • 1,788
  • 2
  • 18
  • 24
  • 7
    minor correction, EAX doesn't equal AX, more accurately, AX represents the lower 16-bits (half) of EAX. likewise, AH and AL are the two halves of AX. – Evan Teran Oct 23 '08 at 02:40
  • @EvanTeran Is it possible to obtain the upper half of EAX as well as the lower half? – Anderson Green Mar 05 '13 at 15:27
  • 3
    @AndersonGreen: not directly (see the chart in Mike Thompson's answer). The only way to get the upper half of `EAX` would be to do some shifting/rotating. For example: `ROR EAX, 16; MOV DX AX; ROR EAX, 16`. This will put the upper 16-bits of `EAX` into `DX` and then restore `EAX` back to it's original value. Personally, I would have loved to have seen register aliases for the upper halves as well. I think it would have made a lot of code more concise. – Evan Teran Mar 05 '13 at 15:57
  • [Is it possible to access the higher part of the 32-bit and 64-bit registers? If so, which ones?](https://reverseengineering.stackexchange.com/q/18735/2563) – phuclv Jul 09 '18 at 15:24
  • Related: [How do AX, AH, AL map onto EAX?](https://stackoverflow.com/q/15191178), for a canonical duplicate about how the actual design for how partial registers map onto full registers. – Peter Cordes May 22 '23 at 00:57

3 Answers3

138

Just for some clarification. In the early microprocessor days of the 1970's, CPUs had only a small number of registers and a very limited instruction set. Typically, the arithmetic unit could only operate on a single CPU register, often referred to as the "accumulator". The accumulator on the 8 bit 8080 & Z80 processors was called "A". There were 6 other general purpose 8 bit registers: B, C, D, E, H & L. These six registers could be paired up to form 3 16 bit registers: BC, DE & HL. Internally, the accumulator was combined with the Flags register to form the AF 16 bit register.

When Intel developed the 16 bit 8086 family they wanted to be able to port 8080 code, so they kept the same basic register structure:

8080/Z80  8086
A         AX
BC        BX
DE        CX
HL        DX
IX        SI    
IY        DI

Because of the need to port 8 bit code they needed to be able to refer to the individual 8 bit parts of AX, BX, CX & DX. These are called AL, AH for the low & high bytes of AX and so on for BL/BH, CL/CH & DL/DH. IX & IY on the Z80 were only ever used as 16 bit pointer registers so there was no need to access the two halves of SI & DI.

When the 80386 was released in the mid 1980s they created "extended" versions of all the registers. So, AX became EAX, BX became EBX etc. There was no need to access to top 16 bits of these new extended registers, so they didn't create an EAXH pseudo register.

AMD applied the same trick when they produced the first 64 bit processors. The 64 bit version of the AX register is called RAX. So, now you have something that looks like this:

|63..32|31..16|15-8|7-0|
               |AH.|AL.|
               |AX.....|
       |EAX............|
|RAX...................|
Rafael Winterhalter
  • 42,759
  • 13
  • 108
  • 192
Mike Thompson
  • 6,708
  • 3
  • 32
  • 39
  • 1
    There's generally no explanation as to why there isn't a pseudo-register for say 31..16 portion of EAX. I suppose it was not needed... – Calyth Jan 07 '09 at 21:44
  • 7
    Actually, there's an undocumented 'feature' in the Z80 (which isn't actually an Intel chip anyway) that allows you to address the IX and IY as high and low bytes. The opcode is a prefix + an HL opcode; if you use an H or L opcode, you get the half-word effect. – ijw Sep 16 '12 at 00:30
  • 5
    I'd say, register correspondence is more like this: 8080/Z80, 8086, x86 Encoding: A AX 000 BC CX 001 DE DX 010 HL BX 011 IX SI 110 IY DI 111 – noop Oct 01 '12 at 08:10
  • 1
    For anyone wanting more information, this is a fairly helpful and concise overview http://www.cs.virginia.edu/~evans/cs216/guides/x86.html – SullX Jun 28 '13 at 19:55
  • 2
    Although the splitting registers were no doubt inspired by the 8080, splitting the registers meant that the processor could be viewed as having eight 16-bit registers and no 8-bit registers, or 7+2, or 6+4, or 5+6, or 4+8. In hand-written assembly it might have been helpful if one of the 32-bit registers was separate from the 16-bit ones, and DX:AX together behaved as a 32-bit register (thus allowing 7+2+0, 7+1+2, or 7+0+4 registers of 32/16/8 bits each) but the benefits would probably not have justified the complexity. – supercat Nov 19 '14 at 23:10
34

There are a lot of answers posted here, but none really answer the given question: Why isn't there a register that directly encodes the high 16 bits of EAX, or the high 32 bits of RAX? The answer boils down to the limitations of the x86 instruction encoding itself.

16-Bit History Lesson

When Intel designed the 8086, they used a variable-length encoding scheme for many of the instructions. This meant that certain extremely-common instructions, like POP AX, could be represented as a single byte (58), while rare (but still potentially useful) instructions like MOV CX, [BX+SI+1023] could still be represented, even if it took several bytes to store them (in this example, 8B 88 FF 03).

This may seem like a reasonable solution, but when they designed it, they filled out most of the available space. So, for example, there were eight POP instructions for the eight individual registers (AX, CX, DX, BX, SP, BP, SI, DI), and they filled out opcodes 58 through 5F, and opcode 60 was something else entirely (PUSHA), as was opcode 57 (PUSH DI). There's no room left over for anything after or before those. Even pushing and popping the segment registers — which is conceptually nearly identical to pushing and popping the general-purpose registers — had to be encoded in a different location (down around 06/0E/16/1E) just because there wasn't room beside the rest of the push/pop instructions.

Likewise, the "mod r/m" byte used for a complex instruction like MOV CX, [BX+SI+1023] only has three bits for encoding the register, which means it can only represent eight registers total. That's fine if you only have eight registers, but presents a real problem if you want to have more.

(There's an excellent map of all these byte allocations in the x86 architecture here: https://i.stack.imgur.com/9u8BS.png . Notice how there's no space left in the primary map, with some instructions overlapping bytes, and even how much of the secondary "0F" map is used now thanks to the MMX and SSE instructions.)

Toward 32 and 64 Bits

So to even allow the CPU design to be extended from 16 bits to 32 bits, they already had a design problem, and they solved that with prefix bytes: By adding a special "66" byte in front of all of the standard 16-bit instructions, the CPU knows you want the same instruction but the 32-bit version (EAX) instead of the 16-bit version (AX). The rest of the design stayed the same: There were still only eight total general-purpose registers in the overall CPU architecture.

Similar hackery had to be done to extend the architecture to 64-bits (RAX and friends); there, the problem was solved by adding yet another set of prefix codes (REX, 40-4F) that meant "64-bit" (and effectively added another two bits to the "mod r/m" field), and also discarding weird old instructions nobody ever used and reusing their byte codes for newer stuff.

An Aside on 8-Bit Registers

One of the bigger questions to ask, then, is how the heck things like AH and AL ever worked in the first place if there's only really room in the design for eight registers. The first part of the answer is that there's no such thing as "PUSH AL" — some instructions simply can't operate on the byte-sized registers at all! The only ones that can are a few special oddities (like AAD and XLAT) and special versions of the "mod r/m" instructions: By having a very specific bit flipped in the "mod r/m" byte, those "extended instructions" could be flipped to operate on the 8-bit registers instead of the 16-bit ones. It just so happens that there are exactly eight 8-bit registers, too: AL, CL, DL, BL, AH, CH, DH, and BH (in that order), and that lines up very nicely with the eight register slots available in the "mod r/m" byte.

Intel noted at the time that the 8086 design was supposed to be "source compatible" with the 8080/8085: There was an equivalent instruction in the 8086 for each of the 8080/8085 instructions, but it didn't use the same byte codes (they aren't even close), and you'd have to recompile (reassemble) your program to get it to use the new byte codes. But "source compatible" was a way forward for old software, and it allowed the 8085's individual A, B, C, etc. and combo "BC" and "DE" registers to still work on the new processor, even if they were now called "AL" and "BL" and "BX" and "DX" (or whatever the mapping was).

So that's really the real answer: It's not that Intel or AMD intentionally "left out" a high 16-bit register for EAX, or a high 32-bit register for RAX: It's that the high 8-bit registers are a weird leftover historical anomaly, and replicating their design at higher bit sizes would be really difficult given the requirement that the architecture be backward-compatible.

A Performance Consideration

There is one other consideration as to why those "high registers" haven't been added since, as well: Inside modern processor architectures, for performance reasons, the variably-sized registers don't actually overlap for real: AH and AL aren't part of AX, and AX isn't a part of EAX, and EAX isn't a part of RAX: They're all separate registers under the hood, and the processor sets an invalidation flag on the others when you manipulate one of them so that it knows it will need to copy the data when you read from the others.

(For example: If you set AL = 5, the processor doesn't update AX. But if you then read from AX, the processor quickly copies that 5 from AL into AX's bottom bits.)

By keeping the registers separate, the CPU can do all sorts of clever things like invisible register renaming to make your code run faster, but that means that your code runs slower if you do use the old pattern of treating the small registers as pieces of larger registers, because the processor will have to stall and update them. To keep all of this internal bookkeeping from getting out of hand, the CPU designers wisely chose to add separate registers on the newer processors rather than to add more overlapping registers.

(And yes, that means that it really is faster on modern processors to explicitly "MOVZX EAX, value" than to do it the old, sloppier way of "MOV AX, value / use EAX".)

Conclusion

With all that said, could Intel and AMD add more "overlapping" registers if they really really wanted to? Sure. There are ways to worm them in if there was enough demand. But given the significant historical baggage, the current architectural limitations, the notable performance limitations, and the fact that most code these days is generated by compilers optimized for non-overlapping registers, it's highly unlikely they'll add such things any time soon.

Sean Werkema
  • 5,810
  • 2
  • 38
  • 42
  • 4
    Only [Intel P6/SnB uarch families rename sub-registers separately](http://agner.org/optimize/). On AMD CPUs, and Intel Atom/Silvermont/P4, writing to AL has a false dependency on the previous contents of EAX (even if you don't ever read the full register). However, you don't get partial-reg stalls for writing AL and then reading EAX. (Intel IvB removes partial-reg merging penalties for low halves (AL/BL/...), while Haswell removes the penalties even for AH/BH/... So you get the full benefit of separate dep chains for writing partial regs without paying any merging costs.) – Peter Cordes Feb 26 '16 at 01:37
  • 4
    I think it would have been interesting for AMD64 to have sliced up RAX into 8 byte registers, instead of providing access to the low byte of every reg. So you could `movzx ecx, eax{5}` or something, to unpack the 5th byte for use as an array index. It's unusual to need a huge amount of byte registers; more common to want to unpack a 64bit load into multiple bytes. `setcc` could have been changed to take an r/m32 operand, to remove the need for xor-zeroing the upper reg and that use-case for needing to write the low byte of every possible register. Diff from compat mode = more transistors :/ – Peter Cordes Feb 26 '16 at 01:44
  • Your conclusion is a little bogus. Something has to give to make it possible to encode new registers. You can't just add more byte, word, and dword high registers without giving anything up. It's not worth making common instructions longer just to occasionally save a mov/shift. That's why I suggested giving up access to the low byte of rbp,rsp,and r10-r15, or something, when using REX prefixes, to free up 7 extra encodings for byte registers that slice up RAX. That lack of orthogonality is already inconvenient for compilers, though: storing the low byte of something is common. – Peter Cordes Feb 26 '16 at 01:49
  • Bear in mind that what you're suggesting would've required significant changes to the overall x86 architectural design; AMD's goal was to have a 64-bit architecture while keeping the overall design changes as small as possible: They didn't want a *new* architecture; they wanted x86 with 64-bit registers and a 64-bit address space. I have no doubt that the diff of the VHDL (or equivalent) between their final design and the one you proposed is very significant, which means a lot more transistors to implement. – Sean Werkema Mar 04 '16 at 20:37
  • 1
    On top of that, though, there's a bigger concern, which is that more and more code today is high-level — not assembly, and not C or C++. It may be common to want to munge bits in assembly and C, but that's very uncommon in Java or C# or Python or Ruby, where you never really care about the lower bits of an `int`. Any extra silicon to add support for special high-bits registers would be silicon that's completely useless for a very large percentage of real software: Whereas silicon that provides additional word-sized registers offers a very real performance benefit across the board. – Sean Werkema Mar 04 '16 at 20:42
  • 2
    Anyway, none of your statements change the overall point of my original essay, which is that the existence of the 8-bit partial registers in the x86 architecture was a historical anomaly, a leftover anachronism, and was not something that the x64 designers felt was worth the significant effort that would have been required for them to expand on it. – Sean Werkema Mar 04 '16 at 20:47
  • My earlier comment was wrong: Haswell doesn't have free merging for AH/BH/CH/DH. See [this Q&A](https://stackoverflow.com/questions/45660139/how-exactly-do-partial-registers-on-haswell-skylake-perform-writing-al-seems-to) for all the details on partial-register false dependencies and merging on Haswell/Skylake. (TL:DR: they don't rename AL or AX separately from RAX anymore!) – Peter Cordes Aug 13 '17 at 12:08
  • 1
    `MOV CX, [BX*4+BP+1023]` is not actually encodeable, because a16 addressing doesn't have scaling, and it can only use one base register, either `bp` or `bx`, not both. The given encoding consequently is for a different effective address: `-e 100 8B 8C FF 03` \ `-u 100l1` \ `16BA:0100 8B8CFF03 mov cx, [si+03FF]` – ecm Jan 25 '20 at 20:52
  • You're right; I managed to pick an instruction that encoded in x64 but not x86 when I wrote that answer. I've switched it out with `8B 88 FF 03`, which emulated `debug` assures me is `mov cx, [bx+si+1023]`, which is a pretty close example to what I'd written there before, and still expresses well how complicated a CISC x86 instruction can get. Thanks for noticing this! – Sean Werkema Jan 27 '20 at 15:23
  • Re: my 2016 comments: it would have been possible and maybe interesting, but *not* a good idea to provide access to each byte of RAX instead of to the low byte of every full register. It's not a problem for byte loads; code can and usually should use `movzx r10d, byte [rsi]`, but it would be a problem for byte stores: orthogonality is a very good thing for compilers being able to do `mov [rdi], r10b` just as easily as `mov [rdi], al`. Compilers manage ok without byte arithmetic like x86 `add al, cl` on RISC ISAs, but ARM `strb` works on any byte register. – Peter Cordes May 22 '23 at 01:05
27

In the old 8-bit days, there was the A register.

In the 16-bit days, there was the 16 bit AX register, which was split into two 8 bit parts, AH and AL, for those times when you still wanted to work with 8 bit values.

In the 32-bit days, the 32 bit EAX register was introduced, but the AX, AH, and AL registers were all kept. The designers did not feel it necessary to introduce a new 16 bit register that addressed bits 16 through 31 of EAX.

Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
  • It continues to 64-bit registers today, the 64-bit RAX register includes EAX as the lower 32 bits. – Ferruccio Oct 23 '08 at 01:48
  • 4
    'E' and 'X' might both stand for "Extended", but what does the 'R' in RAX mean? – Hugh Allen Oct 23 '08 at 02:07
  • 6
    "R"egister, presumably. There are additional new registers that are just named R+number. – Curt Hagenlocher Oct 23 '08 at 02:33
  • 1
    I've combed through the AMD64 manual and still have no idea what R stands for, except "register"---just to be in line with their R8--R15. (BTW, I have no idea why, with their fixation with numbers, they don't just rename all the existing general registers too. :-P) – C. K. Young Oct 23 '08 at 02:38
  • 8
    i.e., R0 => RAX, R1 => RCX, R2 => RDX, R3 => RBX, R4 => RSP, R5 => RBP, R6 => RSI, R7 => RDI. :-) (BTW it's a pet peeve of mine when people get the register ordering wrong; the order is AX, CX, DX, BX, SP, BP, SI, DI. :-P) – C. K. Young Oct 23 '08 at 02:40
  • 13
    Which register is :-P? :D – Jeff Yates Oct 23 '08 at 03:19
  • What were the names of the registers in the 8008, and why didn't the 8080 provide backward compatibility for them? – Windows programmer Oct 23 '08 at 03:25
  • 1
    @ffpf: :-P came after the list terminator, the full stop (or period, if you're American, or dot, if you're a mathematician); thus, it doesn't count. :-) – C. K. Young Oct 23 '08 at 04:18
  • The 8008 had the same register set as the 8080 but the program counter (PC) was only 14 bits so it could only address 16K – Ferruccio Oct 23 '08 at 11:03
  • 1
    I want to program in the language where :-P is a register. – Erik Forbes May 21 '09 at 19:11
  • You can still easily get the big end of eax. Just subtract ax from eax, then divide the result by 16. – Catharsis Feb 11 '10 at 08:44
  • 5
    @Austin - or shift right 16, saving yourself both an opcode and, more importantly, a divide instruction. Not to mention the divide is by 2^16. Otherwise, spot on ;) – ijw Sep 16 '12 at 00:32
  • 4
    @HughAllen, Paul Nathan, and Curt Hagenlocher: the "R" in "RAX", "RBX", etc. is itself an abbreviation for "Register Extension". Think "Register Extension for BX = RBX". See section 1.2.7 of the AMD programmer's manual: http://www.ptlsim.org/papers/x86-64/Opteron-InstructionSet-24594.pdf – William Leara Dec 21 '12 at 00:22
  • 1
    @Chris Jester-Young: You can actually use R0 to R7 to refer to the original 8 GPRs, it just isn't very common. https://stackoverflow.com/questions/9129933/do-we-also-refer-to-the-registers-rax-rbx-etc-as-r1-r2-and-so-on/ – ecm Jan 25 '20 at 20:59