In 16 bit x86 assembly you would write:
mov ax, @DATA
mov ds, ax
How is the extact code conversion for 32 bit ? I don't know the data segment in 32 bit.
mov eax, @DATA
mov ? , eax
Thank you for your answers!
In 16 bit x86 assembly you would write:
mov ax, @DATA
mov ds, ax
How is the extact code conversion for 32 bit ? I don't know the data segment in 32 bit.
mov eax, @DATA
mov ? , eax
Thank you for your answers!
The simple answer
The 32-bit code looks exactly like the 16-bit code:
mov ax, @DATA
mov ds, ax
... because the selector registers (cs
, ds
, es
, ss
, fs
and gs
) are still only 16 bits wide in 32-bit code. Therefore the "segment values" are also 16 bits wide and the lower 16 bits of the general purpose registers (e.g. ax
) have not been renamed.
The more complex answer
Only few object file formats support selectors of segments in 32-bit code!
The line mov ax, @DATA
will be rejected by the assembler because there is simply no possibility to represent this line in the object file!
Most 32-bit operating systems use a "flat" memory layout. This means that cs
, ds
, es
and ss
point to the physical address 0 and have a limit of 4GiB. In other words: The whole memory can be addressed directly without having the need to change the values of the selector registers.
For this reason most object file formats don't even support this feature.
There are few operating systems that really use the selector registers in 32-bit code. For such systems you'll have to use development software (assembler, compiler, linker ...) that uses an object file format supporting this!
If you have such software (and you use such an operating system) the code is identical to the 16-bit code (as shown above).
EDIT
After having read the comments I want to clarify the following sentence:
There are few operating systems that really use the selector registers in 32-bit code...
What is meant here is: Only few 32-bit operating systems use different values for the selector registers to access different sections (code, data, const, BSS...) in an executable file. (There are however other uses for different values in the selector registers.)
Most operating systems use the the same selector value for all segments/sections in the executable file and these operating systems already initialize the selector registers to the correct value. Therefore an instruction mov ax, @DATA
does not really make sense. Unlike the instruction mov ax, ds
(having the same effect under these conditions) an instruction mov ax, @DATA
would also require a special feature (a special "relocation") in the object file format which will simply not be implemented because such an instruction does not make any sense.
However there are few 32-bit operating systems which do not use a "flat" memory layout but use different selector values for different data segments in a program. Such operating systems must use different object file formats, of course. For such operating systems an instruction mov ax, @DATA
is definitely supported. However I doubt that there are assemblers that allow assigning selector values to 32-bit registers (mov eax, @DATA
).
(You'd normally only do this in kernel mode as part of saving/restoring user-space context. In 32-bit or 64-bit user-space under normal OSes with flat memory, use wrfsbase
to modify the base address of fs
for thread-local storage. Or equivalent for gs
. These instructions require the FSGSBASE CPU feature.)
Intel recommends writing mov ds, eax
to avoid wasting space on an operand-size prefix for 32-bit or 64-bit mode. However, NASM and YASM make this optimization for you. @Cody reports that MASM won't assemble mov fs, eax
or mov eax, fs
. The GNU assembler matches what NASM/YASM do.
; NASM/YASM machine code output
mov fs, ax ; 8e e0
mov fs, eax ; 8e e0
mov fs, rax ; 8e e0
mov fs, r10d ; 41 8e e2 (rex.w=0)
It matters when going the other direction:
mov ax, fs ; 66 8c e0 only modifies AX, leaving upper bits
mov eax, fs ; 8c e0 zeros whole rax (on P6 and later CPUs, undefined upper bytes on earlier)
mov rax, fs ; 8c e0 zeros whole rax (YASM: 48 8c e0)
mov r10d, fs ; 41 8c e2 (rex.w=0)
mov r10, fs ; 49 8c e2 (rex.w=1)
But note that a mov eax, fs
/ mov fs, eax
round trip is still always safe, even on an old CPU where it could leave high garbage (see below), because the mov fs, eax
ignores the high 2 bytes.
From Intel's vol. 2 manual (instruction-set reference), mov
entry says this about reg,Sreg or Sreg,reg forms:
When operating in 32-bit mode and moving data between a segment register and a general-purpose register, the 32-bit IA-32 processors do not require the use of the 16-bit operand-size prefix (a byte with the value 66H) with this instruction, but most assemblers will insert it if the standard form of the instruction is used (for example, MOV DS, AX). [...]
When the processor executes the instruction with a 32-bit general-purpose register, it assumes that the 16 least-significant bits of the general-purpose register are the destination or source operand.
This is a confusing way to say that the segment reg value itself is read from or stored in the low 16.
It's confusing because it seem to imply (incorrectly) that the high 16b of a GP register is not part of the destination. Then they're still talking only about 32-bit (and 64-bit) destination registers, not GP destination registers in general.
If the register is a destination operand, the resulting value in the two high-order bytes of the register is implementation dependent. For the Pentium 4, Intel Xeon, and P6 family processors, the two high-order bytes are filled with zeros; for earlier 32-bit IA-32 processors, the two high order bytes are undefined.
So beware that on old CPUs (like P5 Pentium and earlier), mov eax, fs
may not zero-extend, and may leave garbage instead. But mov fs, eax
is always safe.
With a memory destination, it's always a 16-bit load or store, even if you use REX.W=1 (which NASM will encode with a qword
size override, but YASM chokes on it.)
Intel's manual says there's a MOV r/m64,Sreg
form, but that's bogus; it does not zero-extend to 64-bit with a memory destination.
This came up recently in a clang assembler bug about assembling movw %fs, (%rsi)
with a 66
operand-size prefix (which turns out to be redundant). Some of the above is copy/pasted from what I wrote there.
I investigated by watching registers and memory with a debugger while single-stepping this NASM program:
global _start
_start:
mov rsi, rsp
mov eax, 0xdeadbeef
mov [rsi], eax
mov dword [rsi+4], 0xbadf00d
;;; memory is set up
firstbreak:
mov [rsi], fs ; 8c 26 16-bit store
;mov dword [rsi], fs ; not encodeable (even in 32-bit mode)
mov qword [rsi], fs ; 48 8c 24 24 YASM chokes, NASM assembles REX.W 8c 26. Still a 16-bit store!!!
mov ax, fs ; 66 8c e0 only modifies AX, leaving upper bits
mov eax, fs ; 8c e0 zeros whole rax
mov rax, fs ; 8c e0 zeros whole rax (YASM: 48 8c e0)
mov r10d, fs ; 41 8c e2 (rex.w=0)
mov r10, fs ; 49 8c e2 (rex.w=1)
xor ebx,ebx
mov eax,1
int 0x80 ; sys_exit(0) (32-bit ABI so you can more easily assemble this as 32-bit code)