28

I was reading The Art of Assembly Language (Randall Hyde, link to Amazon) and I tried out a console application in that book. It was a program that created a new console for itself using Win32 API functions. The program contains a procedure called LENSTR, which stores the length of string in the EBP register. The code for this function is as follows:

LENSTR PROC
ENTER 0, 0
PUSH  EAX
;----------------------
CLD
MOV   EDI, DWORD PTR [EBP+08H]
MOV   EBX, EDI
MOV   ECX, 100 ; Limit the string length
XOR   AL, AL
REPNE SCASB ; Find the 0 character
SUB   EDI, EBX ; String length including 0
MOV   EBX, EDI

DEC   EBX
;----------------------
POP   EAX
LEAVE
RET   4
LENSTR ENDP

Could you explain the usage of the enter and leave commands here?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
devjeetroy
  • 1,855
  • 6
  • 26
  • 43
  • I believe there is a substantially better more detailed answer. You're free to reevaluate or chose whatever you think is best. – Evan Carroll Mar 18 '18 at 20:11
  • "which stores the length of string in the `EBP` register" Incorrect, it stores the length in `ebx` – ecm Mar 09 '21 at 17:39

3 Answers3

57

Enter creates a stack frame, and leave destroys a stack frame. With the 0,0 parameters on the enter, they're basically equivalent to:

; enter
push ebp
mov ebp, esp

; leave
mov esp, ebp
pop ebp

Although it's not used in the code you posted, enter does support doing a bit more than the simple push/mov combination shown above. The first parameter to enter specifies an amount of space to allocate for local variables. For example, enter 5, 0 is roughly equivalent to:

push ebp
mov ebp, esp
sub esp, 5

Enter also supports languages like Pascal that can use nested functions/procedures:

procedure X;
    procedure Y;
    begin
        { ... }
    end
begin
   { ... }
end

In a case like this, Y has access not only to its own local variables, but also to all variables local to X. These can be nested to arbitrary depth, so you could have a Z inside of Y that had access to its own local variables, and the variables of Y and the variables of X. The second parameter to enter specifies the nesting depth, so X would use enter Sx, 0, Y would use enter Sy, 1 and Z would use enter Sz, 2 (where Sx, Sy and Sz signify the size of variables local to X, Y and Z respectively).

This would create a chain of stack frames to give Z access to variables local to Y and X, and so on. This becomes fairly non-trivial if the functions are recursive, so an invocation of Z can't just walk up the stack to the two most recent stack frames--it needs to skip across stack frames from previous invocations of itself, and go directly back to stack frames for the lexical parent function/procedure, which is different from its caller in the case of recursion.

This complexity is also why C and C++ prohibit nested functions. Given the presence of enter/leave, they're fairly easy to support on Intel processors, but can be considerably more difficult on many other processors that lack such direct support.

This also at least helps explain one other...feature of enter--for the trivial case being used here (i.e., enter 0, 0) it's quite a bit slower than the equivalent using push/mov.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • 6
    @BlackBear: No, they're instructions, but they're kind of "shorthand" instructions -- you can accomplish the same without them. – Jerry Coffin May 02 '11 at 15:53
  • @devjeetroy Review all the answers and accept the one that best answered your question. There's a little check box near each answer, click on it to select the official answer to your question. – karlphillip May 02 '11 at 16:03
  • In the event where you are creating nested functions, would it be worth it to roll your own or use or `enter`? Ie, if you need the functionality, is it faster? What about local scope, for instance in languages that localize variables to if statements (like `let`) internally, are these on their own stack, and could they use enter? Would it make more sense? – Evan Carroll Mar 18 '18 at 20:10
  • 1
    @EvanCarroll: If you're doing nested functions, you might as well use `enter` and `leave`. If memory serves, they're about the same speed as rolling your own, but save a bit on code space. For other scopes, it usually makes more sense to compute the size for the entire function and allocate that on entry, instead of allocating space for that scope as you enter it. If you want, you can also do that for functions as long as they're not (even indirectly) recursive. – Jerry Coffin Mar 18 '18 at 21:29
  • 4
    `enter` does not affect the status flags, unlike `sub esp, x`. A more accurate equivalent is `lea esp, [esp - x]` – ecm Mar 09 '21 at 17:37
16

This is the setup for the stack frame (activation record) for the function. Internally it normally looks something like this:

    push( ebp );         // Save a copy of the old EBP value
     
    mov( esp, ebp );     // Get ptr to base of activation record into EBP
     
    sub( NumVars, esp ); // Allocate storage for local variables.

// ENTER with a non-zero immediate does all 3 of the above things, slowly.

Then when the stack frame is to be destroyed again, you have to do something along the following lines:

   mov( ebp, esp );    // Deallocate locals and clean up stack.
 
   pop( ebp );         // Restore pointer to caller's activation record.
// LEAVE does the above steps; a RET instruction is separate

   ret();              // Return to the caller.

Here is a better explanation of it using HLA. Though it is well explained in the book you're reading, as I have that book too, and I've read the section explaining it.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Tony The Lion
  • 61,704
  • 67
  • 242
  • 415
  • [This](http://x86.renejeschke.de/html/file_module_x86_id_154.html) and [this](http://x86.renejeschke.de/html/file_module_x86_id_78.html) may be more correct. – lzutao Mar 26 '17 at 07:14
  • 6
    You should omit ret() from your example, since `leave` doesnt perform ret – Aviv Apr 06 '18 at 16:55
1

Enter and leave just setup the stack frame. Usually compilers generate code that directly manipulates the stack frame pointers as enter and leave aren't exactly fast relative to mov/sub (they used to be though, back in the 286 days :-) ).

Brian Knoblauch
  • 20,639
  • 15
  • 57
  • 92
  • 1
    No -- they were never faster, just more compact and versatile (they support Pascal-style nested functions). – Jerry Coffin May 02 '11 at 15:30
  • 4
    Interestingly, MASM32 (when local variables are employed) does the entry portion with PUSH/MOV/ADD, but destroys them with LEAVE... – Brian Knoblauch Sep 12 '11 at 17:04
  • 8
    right: `enter` has overhead to support nested functions, even when you don't use them. `leave` adds no overhead, and is a bit smaller than the equivalent mov/pop. – Jerry Coffin Sep 12 '11 at 17:10
  • That's the same choice GCC makes: `leave` if both mov and pop would have been needed (otherwise just pop ebp / rbp). But never `enter` because it's very slow. – Peter Cordes Mar 09 '21 at 15:22
  • 3
    `leave` is 3 total uops on Sandybridge-family, vs. 2 for `mov`/`pop`. I tested on Skylake with a function that set up RBP as a frame pointer then tore it down with `leave` vs. `mov/pop`, and calling the mov/pop function in a loop really was fewer total `uops_issued.any` for the front-end. So it's slightly more expensive, not just a matter of counting a stack-sync uop against it. – Peter Cordes Mar 09 '21 at 15:22