22

I can push 4 bytes onto the stack by doing this:

push DWORD 123

But I have found out that I can use push without specifying the operand size:

push 123

In this case, how many bytes does the push instruction push onto the stack? Does the number of bytes pushed depends on the operand size (so in my example it will push 1 byte)?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 6
    The native register size, to keep the stack aligned. In 32-bit mode, it will push 4 bytes. In 64-bit mode, it will push 8 bytes. – Cody Gray - on strike Jul 16 '17 at 11:26
  • 1
    BTW, you could have tested this with a debugger. Just single-step the instruction and see how `esp`/`rsp` changes. You could also look at the disassembly output and notice that they both assemble to the same machine code. – Peter Cordes Jul 16 '17 at 23:59

2 Answers2

26

Does the number of bytes pushed depends on the operand size

It doesn't depend on the value of the number. The technical x86 term for how many bytes push pushes is "operand-size", but that's a separate thing from whether the number fits in an imm8 or not.

See also Does each PUSH instruction push a multiple of 8 bytes on x64?

(so in my example it will push 1 byte)?

No, the size of the immediate is not the operand-size. It always pushes 4 bytes in 32-bit code, or 64 in 64-bit code, unless you do something weird.

Recommendation: always just write push 123 or push 0x12345 to use the default push size for the mode you're in and and let the assembler pick the encoding. That is almost always what you want. If that's all you wanted to know, you can stop reading now.


First of all, it's useful to know what sizes of push are even possible in x86 machine code:

  • In 16-bit mode, you can push 16 or (with operand-size prefix on 386 and later) 32 bits.
  • In 32-bit mode, you can push 32 or (with operand-size prefix) 16 bits.
  • In 64-bit mode, you can push 64 or (with operand-size prefix) 16 bits.
    A REX.W=0 prefix does not let you encode a 32-bit push.1

There are no other options. The stack pointer is always decremented by the operand-size of the push2. (So it's possible to "misalign" the stack by pushing 16 bits). pop has the same choices of size: 16, 32, or 64, except no 32-bit pop in 64-bit mode.

This applies whether you're pushing a register or an immediate, and regardless of whether the immediate fits in a sign-extended imm8 or it needs an imm32 (or imm16 for 16-bit pushes). (A 64-bit push imm32 sign-extends to 64-bit. There is no push imm64, only mov reg, imm64)

In NASM source code, push 123 assembles to the operand-size that matches the mode you're in. In your case, I think you're writing 32-bit code, so push 123 is a 32-bit push, even though it can (and does) use the push imm8 encoding.

Your assembler always knows what kind of code it's assembling, since it has to know when to use or not use operand-size prefixes when you do force the operand-size.

MASM is the same; the only thing that might be different is the syntax for forcing a different operand-size.

Anything you write in assembler will assemble to one of the valid machine-code options (because the people that wrote the assembler know what is and isn't encodeable), so no, you can't push a single byte with a push instruction. If you wanted that, you could emulate it with dec esp / mov byte [esp], 123


NASM Examples:

Output from nasm -l /dev/stdout to dump a listing to the terminal, along with the original source line.

Lightly edited to separate opcode and prefix bytes from the operands. (Unlike objdump -drwC -Mintel, NASM's disassembly format doesn't leave spaces between bytes in the machine-code hexdump).

 68 80000000         push 128
 6A 80               push -128                 ;; signed imm8 is -128 to +127
 6A 7B               push byte 123
 6A 7B               push dword 123            ;; still optimized to the imm8 encoding
 68 7B000000         push strict dword 123
 6A 80               push strict byte 0x80     ;; will decode as push -128
 ******************       warning: signed byte value exceeds bounds [-w+number-overflow]

dword is normally an operand-size thing, while strict dword is how you request that the assembler doesn't optimize it to a smaller encoding.

All the preceding instructions are 32-bit pushes (or 64-bit in 64-bit mode, with the same machine code). All the following instructions are 16-bit pushes, regardless of what mode you assemble them in. (If assembled in 16-bit mode, they won't have a 0x66 operand-size prefix)

 66 6A 7B            push word 123
 66 68 8000          push word 128
 66 68 7B00          push strict word 123

NASM apparently seems to treat the byte and dword overrides as applying to the size of the immediate, but word applies to the operand-size of the instruction. Actually using o32 push 12 in 64-bit mode doesn't get a warning either. push eax does, though: "error: instruction not supported in 64-bit mode".

Notice that push imm8 is encoded as 6A ib in all modes. With no operand-size prefix, the operand size is the mode's size. (e.g. 6A FF decodes in long mode as a 64-bit operand-size push with an operand of -1, decrementing RSP by 8 and doing an 8-byte store.)


The address-size prefix only affects the explicit addressing mode used for push with a memory-source, e.g. in 64-bit mode: push qword [rsi] (no prefixes) vs. push qword [esi] (address-size prefix for 32-bit addressing mode). push dword [rsi] is not encodeable, because nothing can make the operand-size 32-bit in 64-bit code1. push qword [esi] does not truncate rsp to 32-bit. Apparently "Stack Address Width" is a different thing, probably set in a segment descriptor. (It's always 64 in 64-bit code on a normal OS, I think even for Linux's x32 ABI: ILP32 in long mode.)


When would you ever want to push 16 bits? If you're writing in asm for performance reasons, then probably never. In my code-golf adler32, a narrow push -> wide pop took fewer bytes of code than shift/OR to combine two 16b integers into a 32b value.

Or maybe in an exploit for 64-bit code, you might want to push some data onto the stack without gaps. You can't just use push imm32, because that sign or zero extends to 64-bit. You could do it in 16-bit chunks with multiple 16-bit push instructions. But still probably more efficient to mov rax, imm64 / push rax (10B+1B = 11B for an 8B imm payload). Or push 0xDEADBEEF / mov dword [rsp+4], 0xDEADC0DE (5B + 8B = 13B and doesn't need a register). four 16-bit pushes would take 16B.


Footnotes:

  1. In fact REX.W=0 is ignored, and doesn't modify the operand-size away from its default 64-bit. NASM, YASM, and GAS all assemble push r12 to 41 54, not 49 54. GNU objdjump thinks 49 54 is unusual, and decodes it as 49 54 rex.WB push r12. (Both execute the same). Microsoft agrees as well, using a 40h REX as padding on push rbx in some Windows DLLs.

    Intel just says that 32-bit pushes are "not encodeable" (N.E. in the table) in long mode. I don't understand why W=1 isn't the standard encoding for push / pop when a REX prefix is needed, but apparently the choice is arbitrary.

    Fun-fact: only stack instructions and a few others default to 64-bit operand size in 64-bit mode. In machine code, add rax, rdx needs a REX prefix (with the W bit set). Otherwise it would decode as add eax, edx. But you can't decrease the operand-size with a REX.W=0 when it defaults to 64-bit, only increase it when it defaults to 32.

    http://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix lists the instructions that default to 64-bit in 64-bit mode. Note that jrcxz doesn't strictly belong in that list, because the register it checks (cx/ecx/rcx) is determined by address-size, not operand-size, so it can be overridden to 32-bit (but not 16-bit) in 64-bit mode. loop is the same.

    It's strange that Intel's instruction reference manual entry for push (HTML extract: http://felixcloutier.com/x86/PUSH.html) shows what would happen for a 32-bit operand-size push in 64-bit mode (the only case where stack address width can be 64, so it uses rsp). Perhaps it's achievable somehow with some non-standard settings in the code-segment descriptor, so you can't do it in normal 64-bit code running under a normal OS. Or more likely it's an oversight, and that's what would happen if it was encodeable, but it's not.

  2. Except segment registers are 16-bit, but a normal push fs will still decrement the stack pointer by the stack-width (operand-size). Intel documents that recent Intel CPUs only do a 16b store in that case, leaving the rest of the 32 or 64b unmodified.

    x86 doesn't officially have a stack width that's enforced in hardware. It's a software / calling convention term, e.g. char and short args passed on the stack in any calling conventions are padded out to 4B or 8B, so the stack stays aligned. (Modern 32 and 64-bit calling conventions such as the x86-32 System V psABI used by Linux keep the stack 16B aligned before function calls, even though an arg "slot" on the stack is still only 4B). Anyway, "stack width" is only a programming convention on any architecture.

    The closest thing in the x86 ISA to a "stack width" is the default operand-size of push/pop. But you can manipulate the stack pointer however you want, e.g. sub esp,1. You can, but don't for performance reasons :P

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • I have a question. You said that _It always pushes 4 bytes in 32-bit code, or 64 in 64-bit code, unless you do something weird_ and _In 64-bit mode, you can push 64 or (with operand-size prefix) 16 bits_. It means setting operand size explicitly by hand is wierd? – St.Antario Feb 09 '18 at 19:29
  • Anyway... It still seems like some magic to me... Why cant 64bit immediates be pushed, but 64bit registers can? So we have to `mov` the immediate into the register first and then `push`. 2 instructions instead of 1... – St.Antario Feb 09 '18 at 19:33
  • 1
    @St.Antario: Yes, 16-bit `push` is weird, except in 16-bit mode. It misaligns the stack, so you'd only use it for crazy hacks [like merging two 16-bit values into a 32-bit value with minimum code-size](https://codegolf.stackexchange.com/questions/78896/compute-the-adler-32-checksum/78972#78972), at the expense of performance. – Peter Cordes Feb 10 '18 at 01:31
  • 1
    @St.Antario: It's an instruction-encoding issue: the only instruction that takes an `imm64` is `mov`. Others, like `push`, `add`, `imul`, all take only a 32-bit (or 8-bit) immediate, because most constants are small. You don't want an 8-byte immediate just so you can add or push 1024. I'm not sure if there's an existing Q&A about why x86-64 is designed with mostly sign-extended-`imm32` (and sign-extended-`disp32` in addressing modes); I had a quick look but didn't find one when looking for more duplicates for https://stackoverflow.com/questions/48705762/pushing-imm32-ends-up-in – Peter Cordes Feb 10 '18 at 01:35
  • Does push ax (without operand-size prefix) works in 32-bit mode? If so how ax will be extended to 32 bits? (Intel documentation only specifies extension for segment registers and immediate values) – Plastic Apr 24 '23 at 09:56
  • 1
    @Plastic: `push ax` implies an operand-size of 16-bit, the size of the register. Exactly like `push word ax`. It doesn't extend anything because the memory operand is only 16 bits wide. – Peter Cordes Apr 24 '23 at 10:05
  • So `push ax` in 32-bit mode will be encoded as 6650H (i.e. with 66H operand-size prefix)? While in 16-bit mode it will be encoded as 50H (i.e. without 66H operand-size prefix)? Is it not possible to use 50H encoding (without operand-size prefix) in 32-bit mode? – Plastic Apr 24 '23 at 10:28
  • 1
    @Plastic: `50h` in 32-bit mode is `push eax`, of course. – Peter Cordes Apr 24 '23 at 11:12
1

The "stack width" in a computer, which is the smallest amount of data that can be pushed onto the stack, is defined to be the register size of the processor. This means that if you are dealing with a processor with 16 bit registers, the stack width will be 2 bytes. If the processor has 32 bit registers, the stack width is 4 bytes. If the processor has 64 bit registers, the stack width is 8 bytes.

Don't be confused when using modern x86/x86_64 systems; if the system is running in a 32 bit mode, the stack width and register size is 32 bits or 4 bytes. If you switch to 64 bit mode, then and only then will the register and stack size change.

David Hoelzer
  • 15,862
  • 4
  • 48
  • 67
  • 2
    IIRC, in x86-64, you can push 2 byte values too, i.e. 1/4 of the stack width. – Rudy Velthuis Jul 16 '17 at 14:55
  • @RudyVelthuis is correct. See [this code-golf answer](https://codegolf.stackexchange.com/questions/78896/compute-the-adler-32-checksum/78972#78972) where I used that instead of shift/OR to merge things. Looking at it again now, my comments say that `push r16` modifies RSP by only 2, rather than padding to the "stack width". – Peter Cordes Jul 16 '17 at 14:57
  • 1
    Also, the first paragraph is not totally accurate: 16-bit mode on 386 and later still has 32-bit registers (accessible with an operand-size prefix), but `push` without an operand-size prefix still pushes 16 bits. – Peter Cordes Jul 16 '17 at 15:00
  • 1
    You *can* push a WORD in 32-bit or 64-bit mode, but you'll need the size override prefix on `PUSH`. I don't think any assembler is going to do that unless you specifically request it with a `WORD` or `WORD PTR` directive. A more precise wording in the first paragraph would be [the "operand-size attribute of the current code segment"](http://x86.renejeschke.de/html/file_module_x86_id_269.html), but that's admittedly clunky. It is just the register width that matches the current *operating* mode, so obviously if you're in 16-bit mode, it'll be 2 bytes, even though 32-bit registers are available. – Cody Gray - on strike Jul 16 '17 at 15:21
  • @CodyGray: The current wording no longer says anything about the operand-size attribute of the current code segment. `push` is different from non-stack instructions: its default operand-size in 64-bit is 64-bit, even without REX.W=1. Since `push` operand-size stuff is so weird, I decided this would be a good place to attempt a canonical answer. – Peter Cordes Jul 16 '17 at 22:55
  • @Peter I didn't know the wording had changed, but `PUSH` isn't the only instruction that has a default operand size of 64-bit in long mode even without the `REX` prefix. Near `CALL`, `RET`, `JMP`, and `Jcc` work that way, too, as do some of the more obscure instructions like `ENTER`, `LEAVE`, `LOOPcc`. Oh, and of course `POP`. I guess you'd say all of those are "stack" instructions, but I don't think this is nearly as confusing or weird an issue as you're making it! – Cody Gray - on strike Jul 17 '17 at 05:17
  • @CodyGray: interesting point about `LOOP`: the manual documents that it ignores `REX.W` (like I found that `push` does), and its address-size controls whether it uses cx/ecx/rcx. Weird things with `push`: NASM size specifiers don't do what you'd expect: `push dword 123` doesn't warn in 64-bit mode. Intel's manual seems to show (in the Operation section) that it's possible for `push` to decrement RSP by 4. But it's not, only 64 or 16, which seems odd. – Peter Cordes Jul 17 '17 at 05:29
  • @CodyGray: Just using `push` the normal way is not difficult, which is why I added the first section pretty soon after posting my answer. But since lots of things about `push` have puzzled me at various times, I figured that other people might have wondered about them, too. As David's answer shows, it's easy to mis-state the facts when trying to give a simple answer about `push`, especially if you want to to be applicable to all modes (I assume ignoring 32-bit regs in 16-bit mode was intentional). – Peter Cordes Jul 17 '17 at 05:33
  • Ah, I looked at the manuals again, and it looks like "operand-size attribute" is still used in the description of `PUSHF`/`PUSHD` and the pop equivalents, which *don't* work in 64-bit mode. So that's probably why the wording was dropped off in the latest versions, since it's not accurate for long mode. But you're right, it is very weird that the manual for `PUSH` has a bullet point that starts with "Operand size", where it says that a REX.W instruction prefix can be used to override the size. So it's not just the Description pseudo-code that's misleading. – Cody Gray - on strike Jul 17 '17 at 05:37
  • 2
    It's definitely wrong, though. There's no way to push a 32-bit value onto the stack in 64-bit long mode. The table is pretty clear about that not being a valid encoding. I suspect this is just one of those cases where it's difficult even for Intel to document clearly in their manuals, especially since they try this one-size-fits-all approach, where a single manual has to document decades of microprocessors and multiple instruction sets. The thing that *is* quite clear there is: *"in 64-bit mode, the size of the stack pointer is always 64 bits."* @Peter – Cody Gray - on strike Jul 17 '17 at 05:38
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/149346/discussion-between-peter-cordes-and-cody-gray). – Peter Cordes Jul 17 '17 at 05:41