In MOVMSKPD reg, xmm
, VMOVMSKPD reg, xmm2
, or VMOVMSKPD reg, ymm2
I think reg is r32 or r64.
But in MASM, I tested and got the following results :
MOVMSKPD rbx, xmm0 ;OK, 66| 48/ 0F 50 D8
MOVMSKPD ebx, xmm0 ;OK, 66| 0F 50 D8
I doubt whether this result is correct especially in that it has "48 prefix". 48h is a REX prefix with W-bit ON.
In contrast, the following codes are encoded in the exact same machine code and each with VEX.W bit zero.
000004F9 C5 F9/ 50 D8 VMOVMSKPD rbx, xmm0
000004FD C5 F9/ 50 D8 VMOVMSKPD ebx, xmm0
Are all these codes encoded correctly ?
I used ml64.exe. And the target is x86_64 (64-BIT mode).
[test2.asm]
;MOVMSKPD reg, xmm ;66 0F 50 /r
MOVMSKPD rbx, xmm0 ;OK, with a 48 prefix
MOVMSKPD ebx, xmm0 ;OK, without a 48 prefix
;VMOVMSKPD reg, xmm2 ;VEX.128.66.0F.WIG 50 /r
VMOVMSKPD rbx, xmm0 ;OK
VMOVMSKPD ebx, xmm0 ;OK, but the same machine code as above.
;VMOVMSKPD reg, ymm2 ;VEX.256.66.0F.WIG 50 /r
VMOVMSKPD rbx, ymm0 ;OK
VMOVMSKPD ebx, ymm0 ;OK, but the same machine code as above.
[test2.lst]
;MOVMSKPD reg, xmm ;66 0F 50 /r
000004F0 66| 48/ 0F 50 D8 MOVMSKPD rbx, xmm0
000004F5 66| 0F 50 D8 MOVMSKPD ebx, xmm0
;VMOVMSKPD reg, xmm2 ;VEX.128.66.0F.WIG 50 /r
000004F9 C5 F9/ 50 D8 VMOVMSKPD rbx, xmm0
000004FD C5 F9/ 50 D8 VMOVMSKPD ebx, xmm0
;VMOVMSKPD reg, ymm2 ;VEX.256.66.0F.WIG 50 /r
00000501 C5 FD/ 50 D8 VMOVMSKPD rbx, ymm0
00000505 C5 FD/ 50 D8 VMOVMSKPD ebx, ymm0
- Note : I would like to note something that is not directly related but may be relevant.
About the operand-size override prefix (0x66).
In the Intel Manual PDF, I found a sentence that reads "Use of this prefix with MMX, SSE, and/or SSE2 instructions is reserved and may cause unpredictable behavior."
http://gec.di.uminho.pt/Discip/Lesi/AC10203/docs/P4ISAformat.pdf
CHAPTER 2 INSTRUCTION FORMAT
2.2. INSTRUCTION PREFIXES
The operand-size override prefix allows a program to switch between 16- and 32-bit operand sizes.
Either operand size can be the default. This prefix selects the non-default size.
Use of this prefix with MMX, SSE, and/or SSE2 instructions is reserved and may cause unpredictable behavior (see the note below).
NOTE
Some of the SSE and SSE2 instructions have three-byte opcodes. For these three-byte opcodes, the third opcode byte may be F2H, F3H, or 66H.
For example, the SSE2 instruction CVTDQ2PD has the three-byte opcode F3 0F E6.
The third opcode byte of these three-byte opcodes should not be thought of as a prefix, even though it has the same encoding as the operand size prefix (66H) or one of the repeat prefixes (F2H and F3H).
As described above, using the operand size and repeat prefixes with SSE and SSE2 instructions is reserved.
It should also be noted that execution of SSE2 instructions on an Intel processor that does not support SSE2 (CPUID Feature flag register EDX bit 26 is clear) will result in unpredictable code execution.
I think REX.W bit of REX prefix resembles a 66 prefix.
So I doubt that the REX.W bit can only be used in legacy instructions and can't be used in MMX, SSE, and/or SSE2 instructions freely (self-judgementally).
I think in a MMX/SSE/AVX instruction, it can be used safely if REX.W is written in the opcode column in the instruction table, but if REX.W is not written there, it can't be used self-judgementally.
I think as follows:
- in a legacy instruction, one can use REX.W bit almost freely for GPR especially for a size of a destination operand.
- in MMX/SSE/SSE2/AVX/AVX2/AVX512 one can't use REX.W bit freely.
PS1, 2023/07/24, 01:38, JST
AMD Manual: 「AMD 64-Bit Technology, 24108C - January 2001」
Zero-Extension of Results.
In 64-bit mode, when performing 32-bit operations with a GPR destination,the processor zero-extends the 32-bit result into the full 64-bit destination.
8-bit and 16-bit operations on GPRs preserve all unwritten upper bits of the destination GPR.
This is consistent with legacy 16-bit and 32-bit semantics for partial-width results.
Intel Manual: MOVMSKPD—Extract Packed Double Precision Floating-Point Sign Mask
Operation
(V)MOVMSKPD (128-bit Versions)
DEST[0] := SRC[63]
DEST[1] := SRC[127]
IF DEST = r32
THEN DEST[31:2] := 0;
ELSE DEST[63:2] := 0;
FI
VMOVMSKPD (VEX.256 Encoded Version)
DEST[0] := SRC[63]
DEST[1] := SRC[127]
DEST[2] := SRC[191]
DEST[3] := SRC[255]
IF DEST = r32
THEN DEST[31:4] := 0;
ELSE DEST[63:4] := 0;
FI
In the above, note the following
IF DEST = r32
THEN DEST[31:2] := 0;
ELSE DEST[63:2] := 0;
FI
But according to the AMD manual, in 64-BIT mode, if the destination is a 32-BIT General Purpose Register (GPR), upper 32 BITs of the underlying 64-BIT GPR is zero cleared.
But according to the intel manual, if reg is 32-BIT then DEST[31:2] := 0 and if reg is 64-BIT then DEST[63:2] := 0.
I think it is inconsistent if it obeys the AMD general rule using REX.W bit.
If it obeys the AMD general rule, if the destination is 32-BIT then upper 32 bits of the underlying register is zero cleared so that DEST[63:2] := 0. If it is correct, in a manual, writing
IF DEST = r32
THEN DEST[31:2] := 0;
ELSE DEST[63:2] := 0;
FI
does NOT make sense, because in both cases, DEST[63:2] := 0.
My point is that "assuming everything is correct", if the instruction MOVMSKPD ebx, xmm0
exists in 64-BIT mode then it does not obey AMD general rule so that it is inconsistent.
I used the "proof by contradiction" or the method of reductio ad absurdum.
PS2, 2023/07/24, 03:05, JST
I think that :
reg is r32 if cpu is in 32-BIT mode (compatible mode).
reg is r64 if cpu is in 64-BIT mode (long mode).
and it can't be controlled by REX.W bit or VEX.W bit.
And I think there is NOT the instruction MOVMSKPD ebx, xmm0
in 64-BIT mode or "Operation pseudo code" in the Intel Manual is not correct in the upper 32 bits of a destination register. According to the AMD general rule, if a destination operand is a 32-BIT GPR when CPU is in 64-BIT mode then upper 32 bits of the underlying 64-BIT GPR is zero cleared, but the Intel Manual says that IF DEST = r32 THEN the upper 32 bits of the underlying 64-BIT GPR is preserved.
PS3, 2023/07/24, 17:24, JST
PS4, 2023/07/25, 01:22, JST
I found the encoding for MOVMSKPS in Appendix B of Intel Manual as Special Case Instructions Promoted Using REX.W.
Vol. 2D B-63
INSTRUCTION FORMATS AND ENCODINGS
B.13 SPECIAL ENCODINGS FOR 64-BIT MODE
The following Pentium, P6, MMX, SSE, SSE2, SSE3 instructions are promoted to 64-bit operation in IA-32e mode by using REX.W. However, these entries are special cases that do not follow the general rules (specified in Section B.4).
Table B-34. Special Case Instructions Promoted Using REX.W (Contd.)
PS5, 2023/07/25, 05:49, JST
3.1.1.1 Opcode Column in the Instruction Summary Table (Instructions without VEX Prefix)
REX.W — Indicates the use of a REX prefix that affects operand size or instruction semantics. The ordering of the REX prefix and other optional/mandatory instruction prefixes are discussed Chapter 2. Note that REX prefixes that promote legacy instructions to 64-bit behavior are not listed explicitly in the opcode column.
I think it is important that it limits omitting a REX prefix to promote to 64-bit behavior in the opcode column only to "legacy instructions". Therfore for not legacy instuructions, basically REX prefixes to promote those to 64-bit behavior are listed explicitly in the opcode columns.
3.1.1.3 Instruction Column in the Opcode Summary Table
reg — A general-purpose register used for instructions when the width of the register does not matter to the semantics of the operation of the instruction. The register can be r16, r32, or r64.
on the other hand :
r/m8 — A byte operand that is either the contents of a byte general-purpose register (AL, CL, DL, BL, AH, CH, DH, BH, BPL, SPL, DIL, and SIL) or a byte from memory. Byte registers R8B - R15B are available using REX.R in 64-bit mode.
- In the case of r/m8, r represents a 8-bit GPR.
- In the case of r/m16, r represents a 16-bit GPR.
- In the case of r/m32, r represents a 32-bit GPR.
- In the case of r/m64, r represents a 64-bit GPR.
I think in reg/m8, reg represents a r16/r32/r64 GPR and in the instruction the width of the register does not matter to the semantics of the operation of the instruction.
So, I think
- In the case of reg/m8, reg represents a r16/r32/r64 GPR.
- In the case of reg/m16, reg represents a r16/r32/r64 GPR.
- In the case of reg/m32, reg represents a r16/r32/r64 GPR.
- In the case of reg/m64, reg represents a r16/r32/r64 GPR.
PS6, 2023/07/25, 06:32, JST
Intel Manual
3-12 Vol. 1
3.4.1.1 General-Purpose Registers in 64-Bit Mode
When in 64-bit mode, operand size determines the number of valid bits in the destination general-purpose register:
- 64-bit operands generate a 64-bit result in the destination general-purpose register.
- 32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in the destination general-purpose register.
- 8-bit and 16-bit operands generate an 8-bit or 16-bit result. The upper 56 bits or 48 bits (respectively) of the destination general-purpose register are not modified by the operation. If the result of an 8-bit or 16-bit operation is intended for 64-bit address calculation, explicitly sign-extend the register to the full 64-bits.