8

I'm trying to push a 64bit integer but when assembling NASM seems to want to see it as a DWORD not a QWORD.

I'm using ASM to create the shellcode I need to inject a 64bit DLL into a 64bit process. The first QWORD is the old instruction pointer, the second is the address containing the address of the DLL, the third is the address of LoadLibrary. The placeholders are filled in at runtime.

section .text
global _start   

_start:
BITS 64
PUSH QWORD 0xACEACEACACEACEAC
PUSHFQ
push rax
PUSH QWORD 0xACEACEACACEACEAC
MOV RAX, 0xACEACEACACEACEAC
CALL RAX
pop RAX
POPFQ
RETN
user2272296
  • 361
  • 2
  • 5
  • 15
  • Related: [Can I add 64bit constants to 64bit registers?](https://stackoverflow.com/questions/20020589/can-i-add-64bit-constants-to-64bit-registers) for an ALU-instruction version of this, and see also [`mov r64, imm64` vs. loading it from memory](https://stackoverflow.com/questions/46433208/which-is-faster-imm64-or-m64-for-x86-64). – Peter Cordes Mar 02 '18 at 01:17
  • Somewhat related: [How many bytes does the push instruction push onto the stack when I don't specify the operand size?](https://stackoverflow.com/q/45127993) – Peter Cordes May 25 '23 at 04:55

1 Answers1

22

There is no push imm64 instruction. As a workaround you can do one of the following:

  1. go through a register: mov rax, 0xACEACEACACEACEAC; push rax
  2. go through memory: push qword [rel foo]
  3. write it in two parts: push dword low32; mov dword [rsp+4], high32 or sub rsp,8; mov dword [rsp], low32; mov dword [rsp+4], high32
  4. use sign-extension if your immediate allows it
Jester
  • 56,577
  • 4
  • 81
  • 125
  • 2
    is it possible to use two push commands, the first to push the lower 32 and the second to push the upper. This seems to be the only feasible solution. – user2272296 Jun 05 '13 at 17:49
  • That isn't much different from choice #3, why isn't that good? – Jester Jun 05 '13 at 19:36
  • 3
    There doesn't seem to be a way to encode `push` with a 32bit operand-size anyway. (Even with a register source). Intel's insn ref manual says you can, with `REX.W`, but it doesn't work with NASM/YASM/GNU as, or on a real CPU (by sticking a `db 0x40` (REX.W=0) or `db 0x48` (REX.W=1) in front of `push rdx`). Tested on Intel SnB, while single-stepping with GDB. – Peter Cordes Dec 22 '15 at 11:07
  • 3
    Also, method **1** is almost certainly the best, as far overall performance and code size is concerned. `push m32` (**2** ) decodes to 2 uops on Intel and AMD, and the load can miss in cache. (Assuming **4** isn't usable.) – Peter Cordes Dec 22 '15 at 11:33
  • I see this a lot in obfuscated code, and it's neat because you don't have to worry about restoring clobbered registers. `push rbp; lea rbp, [location]; xchg rbp, [rsp]` I have no idea if it's efficient, or whether a `mov` would be better than a `lea`, @PeterCordes? – Orwellophile May 16 '21 at 08:43
  • @Orwellophile: `xchg` with memory is an atomic RMW (implicit `lock` prefix), so it's total garbage for performance. How do you propose using `mov`? For position-independent code, RIP-relative LEA is the only good way to get an address into a register. The alternative would be something like `call next_insn` / next_insn: `add qword [rsp], location - $` to do a RIP-relative push. Still a memory-destination add isn't great, but at least `call rel32=0` doesn't desync ret prediction. Or if `[location]` was a 32-bit absolute (default abs not rel), then `push location` is obvious best. – Peter Cordes May 16 '21 at 13:37
  • @Orwellophile: Re: getting an address into a register: RIP-relative LEA is usually better than `mov r64, imm64`. [How to load address of function or label into register](https://stackoverflow.com/q/57212012). Although if you have a symbol table for runtime fixups (or position-*dependent* code), mov 64-bit absolute is possible. – Peter Cordes May 16 '21 at 13:41
  • @PeterCordes I was just reading my Agner, and his comments on XCHG which sent me back here to check for a reply. Originally I had though `push rbp; mov rbp, 64bit_const; xchg rbp, [rsp]` might be neat alternative to **1**, the `lea` version (instead of **2**) was plainly absurd in retrospect. Learning x86-64 by reverse engineering obfuscated code can leave one with very odd ideas of "good practice." `push rcx; pop rdx` instead of `mov` being particularly annoying because it can't be "fixed" as the `mov` requires 3 bytes, not that push+pop is awesome obfu, they just like to vary the pattern. – Orwellophile May 17 '21 at 13:20
  • @Orwellophile: yeah push/pop is useful for code-golf to save bytes if you have to copy a full 64-bit register. [Tips for golfing in x86/x64 machine code](https://codegolf.stackexchange.com/a/160739). If you can tell from the surrounding code that ECX was already zero-extended into RCX at that point, you *can* use `mov ecx, edx` (2 bytes). Re: trying to avoid dirtying any registers: as usual, trying to preserve all registers over very short scales costs a lot of performance. It's totally normal to use some scratch registers, and efficient instruction sequences depend on it. – Peter Cordes May 17 '21 at 16:52