64-bit and 16-bit (but not 32-bit) pushes are both possible in 64-bit mode. But normally you only want 64-bit stack operations.
MASM supports two syntaxes for 16-bit pushes. (I tested with jwasm -Zne
to disable extensions that MASM wouldn't support, since I don't have MASM itself):
pushw 123 ; can assemble to 66 6a 7b push sign_extended_imm8
push word ptr 123 ; JWASM uses 66 68 7b 00 push imm16
It seems insane to me to use ptr
for an immediate; I'd have expected that to be the syntax for pushing a memory source operand with an absolute addressing mode, but that would be push word ptr [123]
. MASM syntax often doesn't make sense.
(Forcing the longer encoding with word ptr
vs. pushw
might unique to JWASM, treating it as the equivalent of NASM push strict word 123
. Agner Fog's objconv -fmasm
disassembles 66 6A 7B
to push word ptr 123
. Prefer pushw
because of JWASM.)
In NASM it's push word 123
, in GAS .intel_syntax noprefix
it's pushw 123
. GAS Intel syntax is MASM-like and also assembles push word ptr 123
the same way. AT&T syntax is of course pushw $0x1234
; operand-size suffixes are standard for AT&T syntax, vs. a special case for instructions with an implicit memory operand.
To set FLAGS / RFLAGS
If you only need to modify the low 8 bits of FLAGS (condition codes other than OF
), use mov eax, 0x00003400
/ sahf
- Store AH into Flags. Or for example lahf
/ or ah, 1
/ sahf
to inefficiently emulate stc
(Set Carry Flag).
To set RFLAGS, you want push 0x1234
(qword push of a sign extended imm32) / popfq
. FLAGS is the low 16 bits of RFLAGS. (https://en.wikipedia.org/wiki/FLAGS_register).
Stack operations will always affect RSP, not ESP.
MASM / JWASM assemble popf
as a 16-bit pop rather than the default size for the mode, so you need popfq
. Unfortunately you can't even use popfw
to make it explicit, you'd need a comment. (Or use a better assembler like NASM where pushf
/popf
use the same default operand-size as push 123
.)
If you wanted to avoid writing the reserved and special bits in the upper 16 bits of FLAGS with zeros (i.e. just modify FLAGS without touching the rest of RFLAGS/EFLAGS), you could use this inefficient method (with a store-forwarding stall from the wide load containing a recent narrow store.) popf
and popfq
are slowish anyway because microcode has to see if you're setting/clearing special flags like IF. (https://agner.org/optimize/, e.g. 13 cycles on Zen 3/4, 20 cycles on Skylake.)
pushfq ; qword push
mov word ptr [rsp], 0x1234 ; modify the low word
popfq ; qword pop
Or with a 16-bit push, if temporary stack misalignment is safe (see below)
pushw 0x1234
popf ; popfw
The push imm16
encoding has an LCP stall when decoding on Intel CPUs. You might consider mov eax, 0x1234
/ push ax
if you can't avoid using popf
or popfq
in code where performance matters. Or not, since LCP stalls only happen during legacy decode, not from the uop cache.
Windows makes it unsafe to even temporarily misalign RSP by 8?
Joshua comments that Windows can randomly crash your process if RSP is ever misaligned (not a multiple of 8). I don't know the mechanism for this, but perhaps delivery of SEH exceptions if that's possible at that point in your code?
Joshua suggests that a 2-byte push
could crash if it needs to grow the stack, because you'd enter the stack growth handler with RSP not aligned. And there can be other possible mechanisms which might not be fixable by making sure this isn't the deepest the stack's ever grown.
We know normal Windows code can use 8-byte push
/ pop
in function prologues / epilogues since compilers do that, so it's not the same alignment by 16 that the function-calling convention requires.
I seem to recall something about stack-unwind metadata requiring that Windows x64 functions only modify RSP at all during their prologue and epilogue, but I think C compilers for Windows do support alloca
so that can't be fully true. Of course, alloca
will round up the stack adjustment to keep RSP aligned. Probably that requirement to not move RSP at all in the middle of a function only applies if you aren't using RBP as a frame pointer. If someone has something authoritative I could link re: what's safe to do with RSP in Windows programs, let me know.
I'd be surprised if Linux had any problem delivering a signal to a thread where RSP wasn't aligned by 8. (Or with anything else). The ABI has guarantees involving RSP % 16 == 8
on function entry (so does Windows), so signal stack handling has to re-align because a signal could be delivered at any point, between any two instructions, and code can definitely use push-qword (so can Windows). I assume the kernel uses something like user_regs.rsp -= 128;
// preserve the red-zone user_regs.rsp &= -16;
// align