5

Me again.

I'm having a lot of "add esp, 4" in my program and I'm trying to reduce its size. Is there any smaller instruction that can replace "add esp, 4" ?

Stephen Canon
  • 103,815
  • 19
  • 183
  • 269
DavidH
  • 215
  • 1
  • 4
  • 6

8 Answers8

6
pop edx  

Or any other integer register you don't mind destroying.

This is what modern compilers actually do (clang and sometimes gcc) because it's often optimal for performance as well as code-size on modern CPUs.

An add esp,4 after a call would force the CPU's stack engine to insert a stack-sync uop before doing the actual add. If you otherwise don't modify use ESP directly except with stack instructions (e.g. as part of an addressing mode) before the next push/pop/call/ret, then you saved a uop by using pop.

That cache line of stack memory is going to be hot in cache (making the load cheap) if any other stack instructions ran recently.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
jspcal
  • 50,847
  • 7
  • 72
  • 76
  • Can be done with no other register involved? Like "pop " – DavidH Jan 10 '10 at 20:13
  • 3
    @DavidH: No such instruction in the x86 family. A side effect of such a non-orthogonal structure is that every instruction is assumed to have a particular purpose. Fancy tricks like the DEC processors or 68K with `CMP (SP)+,(SP)+` to pop two values off with only minor side effects (setting condition codes) aren't possible because the x86 stack pointer is not a general use register. – wallyk Jan 10 '10 at 20:21
  • 1
    @wallyk: ESP *is* one of the 8 general-purpose integer registers. The difference is that x86 has no post-increment addressing modes. (`lods` implicitly uses ESI that way, but I'm talking about modes you can use with any register). Also, x86 doesn't allow instructions with 2 explicit memory operands. (A ModRM byte can only code for one register and one reg/mem operand, so most instructions have pairs of opcodes for `add r, r/m` and `add r/m, r`, allowing either memory src or dst to be encoded, but not both.) – Peter Cordes Jun 01 '19 at 07:57
5

A better question might be: "why do you have so many add esp, 4 instructions, and what can you do to have less of them?" It's somewhat unusual to be doing lots of small increments to the stack pointer like this.

Are you moving things to/from the stack at the same time? Could you use push/pop instead?

Alternatively, do you really need to update the stack pointer so frequently, or could you get away with moving it once at the beginning of a block of code to make some space on the stack, then restoring it once at the end of the routine?

What are you really trying to do?

Stephen Canon
  • 103,815
  • 19
  • 183
  • 269
  • They are from external function calls – DavidH Jan 10 '10 at 20:15
  • 2
    External function calls don't, on their own, require lots of small adjustments to the stack pointer. Are you adjusting the stack pointer to satisfy an ABI alignment requirement? As part of popping arguments off the stack after the call? – Stephen Canon Jan 10 '10 at 20:16
4

Sorry if this will sound trivial... but if you manage to reorder your code so that several add esp, 4 instructions are consecutive, you can of course simplify them to e.g.:

add esp, 8

or even:

add esp, 12

Just make sure that moved instructions don't reference esp or the stack; or if they do reference something on the stack, they do only via the ebp register.

stakx - no longer contributing
  • 83,039
  • 20
  • 168
  • 268
2

Try using pop eax

alemjerus
  • 8,023
  • 3
  • 32
  • 40
2

One way to do it if you have multiple function calls:

sub esp, 4
mov 0(esp), param
call ...
...
mov 0(esp), param2
call ...
...
add esp, 4

That is, reuse the stack allocated for the first parameter over several function calls.

Richard Pennington
  • 19,673
  • 4
  • 43
  • 72
1

If you are aligning stack after a call, the better way to do this is using RETN X, where X is the number of bytes to add on ESP...

PUSH EAX
CALL EBX (in this func, you use RETN 4)    
<<here the stack is already aligned>>

Or, use POPFD =x

stealthyninja
  • 10,343
  • 11
  • 51
  • 59
ptr0x
  • 11
  • 1
  • `popfd` is very slow; see [my comment](https://stackoverflow.com/questions/2038416/smaller-instruction-than-add-esp-4#comment99407647_2044924) on another answer. `ret imm16` decodes to an extra uop on some CPUs, so it may not actually help performance. And it costs 2 extra bytes of code-size vs. normal `ret`. (But only in the callee, not at each call size). But sure, a callee-pops convention might be a good choice sometimes. – Peter Cordes Jun 01 '19 at 08:07
0

If you're managing a stack(I assume you are), you can use push eax and pop eax to add values to the stack and maintain the esp. You can also use instructions such as pusha/popa to push/pop all GPRs on/off the stack and pushf/popf to push/pop the EFLAGS registers on/off the stack.

Mike
  • 23,892
  • 18
  • 70
  • 90
0

popfd will add 4 to esp in just one byte, with the side effect of randomizing your flags. It might be slow to execute; I don't know.

Of course it would help to see code or know what your requirements really are.

Potatoswatter
  • 134,909
  • 25
  • 265
  • 421
  • `popf` is very slow, like 1 per 14 cycles on Nehalem, or 1 per 20 cycles on Skylake (https://agner.org/optimize/). And it can set or clear DF, so you'd also need `cld` to maintain DF=0 as required by most ABIs. In kernel code, other [important bits in EFLAGS](https://en.wikipedia.org/wiki/FLAGS_register) are modifiable, including IF (interrupt enable/disable) and TF (trap = single-step). These other flags are probably *why* `popf` has to be micro-coded; it's not used often enough for the decoders to decode it differently in user-space, I guess. TL:DR terrible choice vs. `pop ecx`. – Peter Cordes Jun 01 '19 at 08:01