Smaller instruction than "add esp, 4"

Question

Me again.

I'm having a lot of "add esp, 4" in my program and I'm trying to reduce its size. Is there any smaller instruction that can replace "add esp, 4" ?

Can you post a small example so we can see more precisely what you're actually trying to do? — Stephen Canon, Jan 10 '10 at 20:17

score 6 · Answer 1 · edited Jun 01 '19 at 08:17

6

pop edx

Or any other integer register you don't mind destroying.

This is what modern compilers actually do (clang and sometimes gcc) because it's often optimal for performance as well as code-size on modern CPUs.

An add esp,4 after a call would force the CPU's stack engine to insert a stack-sync uop before doing the actual add. If you otherwise don't modify use ESP directly except with stack instructions (e.g. as part of an addressing mode) before the next push/pop/call/ret, then you saved a uop by using pop.

That cache line of stack memory is going to be hot in cache (making the load cheap) if any other stack instructions ran recently.

edited Jun 01 '19 at 08:17

Peter Cordes

328,167
45
605
847

answered Jan 10 '10 at 20:11

jspcal

50,847
7
72
76

Can be done with no other register involved? Like "pop " – DavidH Jan 10 '10 at 20:13
3

@DavidH: No such instruction in the x86 family. A side effect of such a non-orthogonal structure is that every instruction is assumed to have a particular purpose. Fancy tricks like the DEC processors or 68K with `CMP (SP)+,(SP)+` to pop two values off with only minor side effects (setting condition codes) aren't possible because the x86 stack pointer is not a general use register. – wallyk Jan 10 '10 at 20:21
1

@wallyk: ESP *is* one of the 8 general-purpose integer registers. The difference is that x86 has no post-increment addressing modes. (`lods` implicitly uses ESI that way, but I'm talking about modes you can use with any register). Also, x86 doesn't allow instructions with 2 explicit memory operands. (A ModRM byte can only code for one register and one reg/mem operand, so most instructions have pairs of opcodes for `add r, r/m` and `add r/m, r`, allowing either memory src or dst to be encoded, but not both.) – Peter Cordes Jun 01 '19 at 07:57

Stephen Canon · Answer 2 · 2010-01-10T21:41:25.697

5

A better question might be: "why do you have so many add esp, 4 instructions, and what can you do to have less of them?" It's somewhat unusual to be doing lots of small increments to the stack pointer like this.

Are you moving things to/from the stack at the same time? Could you use push/pop instead?

Alternatively, do you really need to update the stack pointer so frequently, or could you get away with moving it once at the beginning of a block of code to make some space on the stack, then restoring it once at the end of the routine?

What are you really trying to do?

edited Jan 10 '10 at 21:41

answered Jan 10 '10 at 20:14

Stephen Canon

103,815
19
183
269

They are from external function calls – DavidH Jan 10 '10 at 20:15
2

External function calls don't, on their own, require lots of small adjustments to the stack pointer. Are you adjusting the stack pointer to satisfy an ABI alignment requirement? As part of popping arguments off the stack after the call? – Stephen Canon Jan 10 '10 at 20:16

score 4 · Answer 3 · answered Jan 10 '10 at 20:45

Sorry if this will sound trivial... but if you manage to reorder your code so that several add esp, 4 instructions are consecutive, you can of course simplify them to e.g.:

add esp, 8

or even:

add esp, 12

Just make sure that moved instructions don't reference esp or the stack; or if they do reference something on the stack, they do only via the ebp register.

alemjerus · Answer 4 · 2010-01-10T20:41:14.737

2

Try using pop eax

edited Jan 10 '10 at 20:41

answered Jan 10 '10 at 20:11

alemjerus

8,023
3
32
40

score 2 · Answer 5 · answered Jan 10 '10 at 20:28

2

One way to do it if you have multiple function calls:

sub esp, 4
mov 0(esp), param
call ...
...
mov 0(esp), param2
call ...
...
add esp, 4

That is, reuse the stack allocated for the first parameter over several function calls.

answered Jan 10 '10 at 20:28

Richard Pennington

19,673
4
43
72

This is the code-gen strategy used by `gcc -faccumulate-outgoing-args`. – Peter Cordes Jun 01 '19 at 08:03

score 1 · Answer 6 · edited Feb 20 '12 at 18:33

1

If you are aligning stack after a call, the better way to do this is using RETN X, where X is the number of bytes to add on ESP...

PUSH EAX
CALL EBX (in this func, you use RETN 4)    
<<here the stack is already aligned>>

Or, use POPFD =x

edited Feb 20 '12 at 18:33

stealthyninja

10,343
11
51
59

answered Jan 21 '11 at 19:37

ptr0x

11
1

`popfd` is very slow; see [my comment](https://stackoverflow.com/questions/2038416/smaller-instruction-than-add-esp-4#comment99407647_2044924) on another answer. `ret imm16` decodes to an extra uop on some CPUs, so it may not actually help performance. And it costs 2 extra bytes of code-size vs. normal `ret`. (But only in the callee, not at each call size). But sure, a callee-pops convention might be a good choice sometimes. – Peter Cordes Jun 01 '19 at 08:07

score 0 · Answer 7 · answered Jan 10 '10 at 20:12

If you're managing a stack(I assume you are), you can use push eax and pop eax to add values to the stack and maintain the esp. You can also use instructions such as pusha/popa to push/pop all GPRs on/off the stack and pushf/popf to push/pop the EFLAGS registers on/off the stack.

score 0 · Answer 8 · answered Jan 11 '10 at 21:04

0

popfd will add 4 to esp in just one byte, with the side effect of randomizing your flags. It might be slow to execute; I don't know.

Of course it would help to see code or know what your requirements really are.

answered Jan 11 '10 at 21:04

Potatoswatter

134,909
25
265
421

`popf` is very slow, like 1 per 14 cycles on Nehalem, or 1 per 20 cycles on Skylake (https://agner.org/optimize/). And it can set or clear DF, so you'd also need `cld` to maintain DF=0 as required by most ABIs. In kernel code, other [important bits in EFLAGS](https://en.wikipedia.org/wiki/FLAGS_register) are modifiable, including IF (interrupt enable/disable) and TF (trap = single-step). These other flags are probably *why* `popf` has to be micro-coded; it's not used often enough for the decoders to decode it differently in user-space, I guess. TL:DR terrible choice vs. `pop ecx`. – Peter Cordes Jun 01 '19 at 08:01

Smaller instruction than "add esp, 4"

8 Answers8

Linked