8

Is all x86 32-bit assembly code valid x86 64-bit assembly code?

I've wondered whether 32-bit assembly code is a subset of 64-bit assembly code, i.e., every 32-bit assembly code can be run in a 64-bit environment?

I guess the answer is yes, because 64-bit Windows is capable of executing 32-bit programs, but then I've seen that the 64-bit processor supports a 32-bit compatible mode?

If not, please provide a small example of 32-bit assembly code that isn't valid 64-bit assembly code and explain how the 64-bit processor executes the 32-bit assembly code.

Shuzheng
  • 11,288
  • 20
  • 88
  • 186
  • You need to use the compatibility mode, yes. – Jester May 20 '17 at 17:59
  • @Jester - can you give example of 32-bit instruction that isn't valid 64-bit instruction? Can you intermix 32-bit and 64-bit assembly code? – Shuzheng May 20 '17 at 18:17
  • `push eax` is not valid in 64 bit mode for example. – Jester May 20 '17 at 18:20
  • @Jester - Can I intermix 32-bit and 64-bit code in compability mode, so that `push eax` becomes valid although mixed with 64-bit instructions? – Shuzheng May 20 '17 at 18:36
  • Or is compability mode solely for the purpose of executing 32-bit instructions alone using a 64-bit processor? – Shuzheng May 20 '17 at 18:37
  • You can't mix them. For the most part, you can use 32 bit operands in 64 bit code though (but that's still 64 bit code). – Jester May 20 '17 at 18:40
  • Some of the sub-op codes (like the one for mod r/m, the memory addressing mode, such as [ebx] versus [rbx]) may be different between 32 bit and 64 bit mode. I don't know if any of the primary opcodes are different. – rcgldr May 20 '17 at 19:51
  • You cannot even run all programs running in 32-bit mode when the CPU is in 64-bit mode: "Virtual mode" (needed for calling the BIOS) for example does not work in "long mode". – Martin Rosenau May 20 '17 at 20:19
  • 1
    @Jester Yes, you can mix them: far jump to a 32 bit selector, then run 32 bit code, then far jump to a 64 bit selector. – fuz May 20 '17 at 20:21
  • 1
    That's not my definition of mixing though ... that means they are separate not mixed ;) – Jester May 20 '17 at 20:27
  • @Jester You can have both pieces in a single program, which means that you can mix them in the same way you can mix ARM and thumb code. – fuz May 20 '17 at 20:29
  • (`inc eax`, `nop`) in 32bit-mode and `rex xchg eax, eax` in 64bit-mode are encoded the same way: 0x40 0x90. You have to tell the machine which is right ,, you cannot just execute it and hope for the best. – sivizius May 21 '17 at 00:18
  • 1
    [What is the difference between assembly language of x86 and x64 architecture?](http://stackoverflow.com/q/20050765/995714) – phuclv May 21 '17 at 16:36
  • 1
    Related: [x86 32 bit opcodes that differ in x86-x64 or entirely removed](https://stackoverflow.com/q/32868293) – Peter Cordes Feb 08 '22 at 16:29

2 Answers2

17

A modern x86 CPU has three main operation modes (this description is simplified):

  • In real mode, the CPU executes 16 bit code with paging and segmentation disabled. Memory addresses in your code refer to phyiscal addresses, the content of segment registers is shifted and added to the address to form an effective address.
  • In protected mode, the CPU executes 16 bit or 32 bit code depending on the segment selector in the CS (code segment) register. Segmentation is enabled, paging can (and usually is) enabled. Programs can switch between 16 bit and 32 bit code by far jumping to an appropriate segment. The CPU can enter the submode virtual 8086 mode to emulate real mode for individual processes from inside a protected mode operating system.
  • In long mode, the CPU executes 64 bit code. Segmentation is mostly disabled, paging is enabled. The CPU can enter the sub-mode compatibility mode to execute 16 bit and 32 bit protected mode code from within an operating system written for long mode. Compatibility mode is entered by far-jumping to a CS selector with the appropriate bits set. Virtual 8086 mode is unavailable.

Wikipedia has a nice table of x86-64 operating modes including legacy and real modes, and all 3 sub-modes of long mode. Under a mainstream x86-64 OS, after booting the CPU cores will always all be in long mode, switching between different sub-modes depending on 32 or 64-bit user-space. (Not counting System Management Mode interrupts...)


Now what is the difference between 16 bit, 32 bit, and 64 bit mode?

16-bit and 32-bit mode are basically the same thing except for the following differences:

  • In 16 bit mode, the default address and operand width is 16 bit. You can change these to 32 bit for a single instruction using the 0x67 and 0x66 prefixes, respectively. In 32 bit mode, it's the other way round.
  • In 16 bit mode, the instruction pointer is truncated to 16 bit, jumping to addresses higher than 65536 can lead to weird results.
  • VEX/EVEX encoded instructions (including those of the AVX, AVX2, BMI, BMI2 and AVX512 instruction sets) aren't decoded in real or Virtual 8086 mode (though they are available in 16 bit protected mode).
  • 16 bit mode has fewer addressing modes than 32 bit mode, though it is possible to override to a 32 bit addressing mode on a per-instruction basis if the need arises.

Now, 64 bit mode is a somewhat different. Most instructions behave just like in 32 bit mode with the following differences:

  • There are eight additional registers named r8, r9, ..., r15. Each register can be used as a byte, word, dword, or qword register. The family of REX prefixes (0x40 to 0x4f) encode whether an operand refers to an old or new register. Eight additional SSE/AVX registers xmm8, xmm9, ..., xmm15 are also available.
  • you can only push/pop 64 bit and 16 bit quantities (though you shouldn't do the latter), 32 bit quantities cannot be pushed/popped.
  • The single-byte inc reg and dec reg instructions are unavailable, their instruction space has been repurposed for the REX prefixes. Two-byte inc r/m and dec r/m is still available, so inc reg and dec reg can still be encoded.
  • A new instruction-pointer relative addressing mode exists, using the shorter of the 2 redundant ways 32-bit mode had to encode a [disp32] absolute address.
  • The default address width is 64 bit, a 32 bit address width can be selected through the 0x67 prefix. 16 bit addressing is unavailable.
  • The default operand width is 32 bit. A width of 16 bit can be selected through the 0x66 prefix, a 64 bit width can be selected through an appropriate REX prefix independently of which registers you use.
  • It is not possible to use ah, bh, ch, and dh in an instruction that requires a REX prefix. A REX prefix causes those register numbers to mean instead the low 8 bits of registers si, di, sp, and bp.
  • writing to the low 32 bits of a 64 bit register clears the upper 32 bit, avoiding false dependencies for out-of-order exec. (Writing 8 or 16-bit partial registers still merges with the 64-bit old value.)
  • as segmentation is nonfunctional, segment overrides are meaningless no-ops except for the fs and gs overrides (0x64, 0x65) which serve to support thread-local storage (TLS).
  • also, many instructions that specifically deal with segmentation are unavailable. These are: push/pop seg (except push/pop fs/gs), arpl, call far (only the 0xff encoding is valid), les, lds, jmp far (only the 0xff encoding is valid),
  • instructions that deal with decimal arithmetic are unavailable, these are: daa, das, aaa, aas, aam, aad,
  • additionally, the following instructions are unavailable: bound (rarely used), pusha/popa (not useful with the additional registers), salc (undocumented),
  • the 0x82 instruction alias for 0x80 is invalid.
  • on early amd64 CPUs, lahf and sahf are unavailable.

And that's basically all of it!

fuz
  • 88,405
  • 25
  • 200
  • 352
  • Better answer than mine! – BeeOnRope May 20 '17 at 22:39
  • there's also RIP-addressing mode which is not available in 16 or 32-bit mode – phuclv May 22 '17 at 02:24
  • @LưuVĩnhPhúc Addressed! – fuz May 22 '17 at 07:05
  • Regular VEX encodings aren't available in 16-bit mode either ([because those invalid opcodes were used intentionally as traps](https://stackoverflow.com/questions/37829075/is-pipelining-oooe-available-on-modern-x86-processors-when-running-in-real-mode#comment63139570_37831392), see also the link in the bottom of my answer that MichaelPetch is commenting on). So that means no AVX1/2 / FMA, and also some BMI instructions aren't usable. – Peter Cordes Jul 01 '20 at 10:43
  • @PeterCordes I'm not sure if that's just a Windows thing. In 16 bit protected mode, VEX opcodes should work just fine (not sure about EVEX). In real mode, they could be available, too. The post you linked mentions just Windows' DOS compatibility mode. They could have just turned off AVX in the appropriate MSR when they enter Virtual 8086 mode. – fuz Jul 01 '20 at 11:47
  • 1
    @PeterCordes Yeah okay. Checking the manuals again, VEX prefixes are indeed verboten in real mode and Virtual 8086 mode but they do work in 16 bit protected mode. – fuz Jul 01 '20 at 11:54
  • Ah, that makes sense, I hadn't considered they might be different in real vs. vm86 mode. Thanks for checking! – Peter Cordes Jul 01 '20 at 19:43
  • @BeeOnRope I think yours it better! (Sorry fuz) Literally the first word of it answers the question, while in fuz's I don't see either a yes or no. – Paul Jul 13 '20 at 16:56
  • @Paul That's because the answer is "it's mostly the same" rather than a hard yes or a hard no. There are some instructions you can't use in 64 bit mode (and vice versa), but by far and large, 32 bit code works in 64 bit mode. – fuz Jul 13 '20 at 17:50
  • I think this comment should have been the first sentence of your answer :) – Paul Jul 13 '20 at 18:09
11

No, it isn't.

While there is a large amount of overlap, 64-bit assembly code is not a superset of 32-bit assembly code and so 32-bit assembly is not in general valid in 64-bit mode.

This applies both the mnemonic assembly source (which is assembled into binary format by an assembler), as well as the binary machine code format itself.

This question covers in some detail instructions that were removed, but there are also many encoding forms whose meanings were changed.

For example, Jester in the comments gives the example of push eax not being valid in 64-bit code. Based on this reference you can see that the 32-bit push is marked N.E. meaning not encodable. In 64-bit mode, the encoding is used to represent push rax (an 8-byte push) instead. So the same sequence of bytes has a different meaning in 32-bit mode versus 64-bit mode.

In general, you can browse the list of instructions on that site and find many which are listed as invalid or not encodable in 64-bit.

If not, please provide a small example of 32-bit assembly code that isn't valid 64-bit assembly code and explain how the 64-bit processor executes the 32-bit assembly code.

As above, push eax is one such example. I think what is missing is that 64-bit CPUs support directly running 32-bit binaries. They don't do it via compatibility between 32-bit and 64-bit instructions at the machine language level, but simply by having a 32-bit mode where the decoders (in particular) interpret the instruction stream as 32-bit x86 rather than x86-64, as well as the so-called long mode for running 64-bit instructions. When such 64-bit chips were first released, it was common to run a 32-bit operating system, which pretty much means the chip is permanently in this mode (never goes into 64-bit mode).

More recently, it is typical to run a 64-bit operating system, which is aware of the modes, and which will put the CPU into 32-bit mode when the user launches a 32-bit process (which are still very common: until very recently my browser was still 32-bit).

All the details and proper terminology for the modes can be found in fuz's answer, which is really the one you should read.

BeeOnRope
  • 60,350
  • 16
  • 207
  • 386