How to avoid using PUSH without POP?

Question

I'm currently writing x86 assembly (FASM) by hand and one typical mistake I often make is to push an argument on the stack, but return before the pop is executed.

This causes the stack offset to change for the caller, which will make the program crash.

This is a rough example to demonstrate it:

proc MyFunction
    ; A loop:
    mov     ecx, 100
.loop:
    push    ecx

    ; ==== loop content
    ...
    ; Somewhere, the decision is made to return, not just to exit the loop
    jmp    .ret
    ...
    ; ==== loop content

    pop     ecx
    loop    .loop

.ret:
    ret
endp

Now, the obvious answer is to pop the proper number of elements off the stack, before issuing a ret. However, it's easy to overlook something in 1000+ lines of handcrafted assembly.

I was also thinking about using pushad / popad always, but I'm not sure what the convention is for that.

Question: Is there any pattern that I could follow to avoid this issue?

Use standard stack frame (`push ebp; mov ebp, esp` at the start, `mov esp, ebp; pop ebp` or `leave` at the end). But if you can't keep track of your pushes you can't make use of the stuff anyway so you should just pay more attention :) — Jester, Jun 24 '21 at 14:32
You can use the normal [function prologue and epilogue](https://en.wikipedia.org/wiki/Function_prologue) to restore `esp`. But then the registers you pushed won't be restored. In assembly you have to do things right, there aren't really any high-level tricks. — interjay, Jun 24 '21 at 14:33
One possibility is to first push all callee-saved registers on the stack and then establish a stack frame. While not compliant with standard conventions, this allows you to clean up the stack correctly even if you lost track of what you pushed. — fuz, Jun 24 '21 at 14:53
I have solved this with **macros** which replace the native `PROC`/`ENDP`. They hide prologue and epilogue of calling convention and also they allow to give formal descriptive names to procedure's parameters, local stack variables and to registers returning the result value, see [Procedure](https://euroassembler.eu/maclib/stdcal32.htm#Procedure). — vitsoft, Jun 24 '21 at 15:13
If you can't keep track of your pushes and pops, don't program in assembler. — TonyK, Jun 24 '21 at 15:36

score 4 · Accepted Answer · edited Jul 08 '21 at 20:48

Normally don't use push/pop inside loops; use mov like a compiler would so you're not moving ESP around unnecessarily. (That can lead to extra stack-sync uops if/when you reference ESP explicitly for other locals.)

Or in this case, just pick a different register for your two different loops, or fully keep the outer loop counter in memory after reserving some space. (sub dword [esp], 1 / jnz .outer_loop. Or [ebp-4] if you set up EBP as a frame pointer instead of just using it as another call-preserved register.)

Spilling/reloading a register around something inside a loop is inefficient. Your first step in freeing up registers should be to keep read-only things in memory, if they're not needed extremely often. e.g. an outer loop counter like inc edx / cmp edx, [esp+12] / jbe .outer_loop avoids a store/reload. Only keep mutable things in memory when you run out of registers, and then of course prefer things that aren't changed often.

In compiler-generated code, you'll normally only see pushes in the prologue, and pops along paths that lead to a ret. That makes it easy to match them up. If you need to save another call-preserved register for use inside the function, or reserve more stack space for locals, you change the sequence of pushes at the top of the function, and then change the epilogue in the return path(s).

(You can have more than one way out of a function, especially if there's not much cleanup needed then tail duplication can be better than a jmp to the other copy of the epilogue.)

You don't have to be as rigidly disciplined (or braindead) as a compiler, after all, you're writing by hand in asm to get better performance. (right? Otherwise just let a compiler do the micro-optimization for you in generating "thousands of lines" of asm! Medium to large amounts of code are where compilers really shine in their ability to quickly analyze data flow and make pretty decent code.)

So you can for example use the asm stack as a stack data structure; something you can't convince a compiler to do. (Using the callstack to implement a stack data structure in C? is an unsafe attempt though.) Like push and pop, with "empty" detection via a pointer compare. In that case you'd want to be using EBP as a frame pointer, if you have any other need for stack memory.

Reading your post and the ones you linked, I think what I got wrong mostly is to store a loop counter in `ecx` consistently, rather than just simply on the stack. I will do this for tight loops only. I also found some places where I push an API return value on the stack to pop it after another API call. I will store those values on the stack and stick to using push/pop in the prologue/epilogue exclusively. Thank you for your explanation and the links you provided! Very interesting to read :) — bytecode77, Jun 25 '21 at 06:23
@bytecode77: The `loop` instruction is [slow anyway, except on AMD](https://stackoverflow.com/questions/35742570/why-is-the-loop-instruction-slow-couldnt-intel-have-implemented-it-efficiently), so there's no advantage to using ECX (unless optimizing for code-size over speed, or specifically for AMD in cases where ECX is convenient). Use a call-preserved register like EBX for a loop counter in a loop that contains function calls, that's the point of call-preserved regs. (Unless you need all the regs for even "hotter" variables) — Peter Cordes, Jun 25 '21 at 06:44

How to avoid using PUSH without POP?

1 Answers1