0

In MASM64, if I write the instruction push 0, it will push a 64-bit immediate on the stack (i.e. RSP = RSP - 8).

So if I just want to push a 16-bit immediate to set FLAGS, I have no idea but write the machine code, such as:

.code
FlagFunction PROC
    dd 00006866h; push a 16-bit immediate 0
    popf
    ret
FlagFunction ENDP
END

The program works but I wonder if there is an actual instruction for this in MASM64.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 7
    `popf` still pops 64 bit so you should push 64 bits. See the "Operation" section of the relevant instruction set reference page which specifically says: _"RFLAGS = Pop(); (* 64-bit pop *)"_ – Jester Oct 10 '21 at 14:14
  • 1
    if you use *masm64* - may be you use *msvc(cl)* too. in this case you can use [`__writeeflags`](https://learn.microsoft.com/en-us/cpp/intrinsics/writeeflags?view=msvc-160) – RbMm Oct 10 '21 at 14:20
  • 5
    Normally you should `pushf` all 64 bits of `rflags`, modify only the bits that you care about, and `popf`. Many of the bits of `rflags` are reserved and you should not try to unconditionally clear them. – Nate Eldredge Oct 10 '21 at 19:36
  • 2
    A 64-bit push will always affect RSP, not just ESP like the first sentence of the question says. And BTW, in a better assembler like NASM, this is easy: `push word 0` vs. `push qword 0`. (With `push 0` defaulting to `push qword`) – Peter Cordes Oct 10 '21 at 20:33
  • 1
    Have you tried using a MASM-syntax disassembler on the machine code you want? If there is syntax for it, that might show it to you. – Peter Cordes Oct 13 '21 at 19:33
  • 1
    We think this code doesn't work. – Joshua Mar 03 '23 at 01:15

3 Answers3

1

64-bit and 16-bit (but not 32-bit) pushes are both possible in 64-bit mode. But normally you only want 64-bit stack operations.

MASM supports two syntaxes for 16-bit pushes. (I tested with jwasm -Zne to disable extensions that MASM wouldn't support, since I don't have MASM itself):

 pushw  123                 ; can assemble to 66 6a 7b   push sign_extended_imm8
 push  word ptr 123         ; JWASM uses      66 68 7b 00  push imm16

It seems insane to me to use ptr for an immediate; I'd have expected that to be the syntax for pushing a memory source operand with an absolute addressing mode, but that would be push word ptr [123]. MASM syntax often doesn't make sense.

(Forcing the longer encoding with word ptr vs. pushw might unique to JWASM, treating it as the equivalent of NASM push strict word 123. Agner Fog's objconv -fmasm disassembles 66 6A 7B to push word ptr 123. Prefer pushw because of JWASM.)


In NASM it's push word 123, in GAS .intel_syntax noprefix it's pushw 123. GAS Intel syntax is MASM-like and also assembles push word ptr 123 the same way. AT&T syntax is of course pushw $0x1234; operand-size suffixes are standard for AT&T syntax, vs. a special case for instructions with an implicit memory operand.


To set FLAGS / RFLAGS

If you only need to modify the low 8 bits of FLAGS (condition codes other than OF), use mov eax, 0x00003400 / sahf - Store AH into Flags. Or for example lahf / or ah, 1 / sahf to inefficiently emulate stc (Set Carry Flag).

To set RFLAGS, you want push 0x1234 (qword push of a sign extended imm32) / popfq. FLAGS is the low 16 bits of RFLAGS. (https://en.wikipedia.org/wiki/FLAGS_register).

Stack operations will always affect RSP, not ESP.

MASM / JWASM assemble popf as a 16-bit pop rather than the default size for the mode, so you need popfq. Unfortunately you can't even use popfw to make it explicit, you'd need a comment. (Or use a better assembler like NASM where pushf/popf use the same default operand-size as push 123.)

If you wanted to avoid writing the reserved and special bits in the upper 16 bits of FLAGS with zeros (i.e. just modify FLAGS without touching the rest of RFLAGS/EFLAGS), you could use this inefficient method (with a store-forwarding stall from the wide load containing a recent narrow store.) popf and popfq are slowish anyway because microcode has to see if you're setting/clearing special flags like IF. (https://agner.org/optimize/, e.g. 13 cycles on Zen 3/4, 20 cycles on Skylake.)

  pushfq                                ; qword push
  mov  word ptr [rsp], 0x1234           ; modify the low word
  popfq                                 ; qword pop

Or with a 16-bit push, if temporary stack misalignment is safe (see below)

  pushw  0x1234
  popf                 ; popfw

The push imm16 encoding has an LCP stall when decoding on Intel CPUs. You might consider mov eax, 0x1234 / push ax if you can't avoid using popf or popfq in code where performance matters. Or not, since LCP stalls only happen during legacy decode, not from the uop cache.


Windows makes it unsafe to even temporarily misalign RSP by 8?

Joshua comments that Windows can randomly crash your process if RSP is ever misaligned (not a multiple of 8). I don't know the mechanism for this, but perhaps delivery of SEH exceptions if that's possible at that point in your code?

Joshua suggests that a 2-byte push could crash if it needs to grow the stack, because you'd enter the stack growth handler with RSP not aligned. And there can be other possible mechanisms which might not be fixable by making sure this isn't the deepest the stack's ever grown.

We know normal Windows code can use 8-byte push / pop in function prologues / epilogues since compilers do that, so it's not the same alignment by 16 that the function-calling convention requires.

I seem to recall something about stack-unwind metadata requiring that Windows x64 functions only modify RSP at all during their prologue and epilogue, but I think C compilers for Windows do support alloca so that can't be fully true. Of course, alloca will round up the stack adjustment to keep RSP aligned. Probably that requirement to not move RSP at all in the middle of a function only applies if you aren't using RBP as a frame pointer. If someone has something authoritative I could link re: what's safe to do with RSP in Windows programs, let me know.

I'd be surprised if Linux had any problem delivering a signal to a thread where RSP wasn't aligned by 8. (Or with anything else). The ABI has guarantees involving RSP % 16 == 8 on function entry (so does Windows), so signal stack handling has to re-align because a signal could be delivered at any point, between any two instructions, and code can definitely use push-qword (so can Windows). I assume the kernel uses something like user_regs.rsp -= 128; // preserve the red-zone user_regs.rsp &= -16; // align

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • alloca assembles to `div 16`, `mul 16`, `call ___checkstk_ms`, `sub rsp, rax` keeping the stack pointer aligned. (I'm pretty sure the `div` and `mul` turn into something better if I turn optimizations on.) – Joshua Mar 05 '23 at 23:10
  • 1
    @Joshua: I think what I was remembering (about not being allowed to change RSP except in the prologue and epilogue) would only apply if you weren't using RBP as a frame pointer, which gives stack unwinding a fixed reference point. Functions using `alloca` do have to use RBP as a frame pointer. – Peter Cordes Mar 05 '23 at 23:16
0

According to the Intel manual here: https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html

PUSH imm16 in 64-bit mode is allowed. When you push a 16-bit immediate value onto the stack using the push imm16 instruction in x86-64 assembly, it will not be zero-extended. The push imm16 instruction adjusts RSP by 2 instead of 8 and only pushes the 16-bit value onto the stack. This is the algorithm the CPU uses:

IF StackAddrSize = 64
THEN
    IF OperandSize = 64
    THEN
        RSP := RSP – 8;
        Memory[SS:RSP] := SRC; (* push quadword *)
    ELSE IF OperandSize = 32
    THEN
        RSP := RSP – 4;
        Memory[SS:RSP] := SRC; (* push dword *)
    ELSE (* OperandSize = 16 *)
        RSP := RSP – 2;
        Memory[SS:RSP] := SRC; (* push word *)
FI
Danny Cohen
  • 77
  • 10
  • 1
    And if you're on Windows; don't use this instruction. You it will crash stuff randomly because it can't tolerate rsp misaligned. The same is probably true of Linux as well unless you have no signal handlers (which is surprisingly likely). – Joshua Mar 03 '23 at 01:12
  • 1
    @Joshua: I'd be surprised if Linux had any problem delivering a signal to a thread where RSP wasn't aligned by 8. The ABI has guarantees involving RSP % 16 == 8 on function entry, so signal stack handling has to re-align because a signal could be delivered at any point, between any two instructions, and code can definitely use push-qword. I assume the kernel uses something like `user_regs.rsp -= 128;` // preserve the red-zone `user_regs.rsp &= -16;` // align – Peter Cordes Mar 03 '23 at 04:45
  • 2
    @Danny: note that the `ELSE IF OperandSize = 32` part of that pseudocode is unreachable; it's impossible to encode a `push` with 32-bit operand-size in 64-bit mode, only 16 or 64. (The description section is also phrased in a way that implies dword might be possible, but the table of valid / invalid forms at the top is correct. https://www.felixcloutier.com/x86/push . I've [tested on an Intel Skylake CPU](https://stackoverflow.com/q/45127993)), and a REX.W=0 prefix does *not* reduce the operand-size from the default 64 down to 32. – Peter Cordes Mar 03 '23 at 04:48
  • 1
    This answer is missing the MASM syntax for how to tell it you want a 16-bit immediate. In NASM it's `push word 0x1234`, in GAS `.intel_syntax noprefix` it's `pushw 0x1234`, but I don't know the MASM equivalent. Agner Fog's `objconv -fmasm` disassembles to `66 6A 7B` to `push word ptr 123`, but I don't have MASM to test if that assembles back; it looks crazy to me to use `ptr` for an immediate operand. – Peter Cordes Mar 03 '23 at 04:49
  • 1
    @Joshua: Do you have more information on how exactly Windows can crash your process if it what, delivers an SEH between `pushw` and `popf`? Is that because there's no way to encode a 2-byte offset in the stack-unwind metadata it relies on to track RSP updates for every RIP? Perhaps if you use an RBP frame pointer it could be ok? AFAIK, Linux doesn't have a mechanism like SEH to catch signals as C++ exceptions, so except for debugging purposes, stack-unwinding can maybe get away with not tracking pushw / popfw with `.cfi` metadata directives. – Peter Cordes Mar 05 '23 at 22:40
  • 2
    @PeterCordes: I had a rather ugly method involving interaction with the antivirus but you've got a better one. If that two byte push triggers a page fault to grow the stack your process will die because you tried to enter the stack grow handler with RSP not aligned. – Joshua Mar 05 '23 at 22:57
-2

As far as I know PUSH and POP always use QWord in x64