5

First of all, I'd like to know if this model is an accurate representation of the stack "framing" process.

I've been told that conceptually, the stack is like a Coke bottle. The sugar is at the bottom and you fill it up to the top. With this in mind, how does the Call tell the EIP register to "target" the called function if the EIP is in another bottle (it's in the code segment, not the stack segment)? I watched a video on YouTube saying that the "Code Segment of RAM" (the place where functions are kept) is the place where the EIP register is.

Austin Copeland
  • 97
  • 1
  • 10
  • You are asking about the stack but in the diagram you appear to call it `ebp`. If that was intentional, I think I know what got you confused. – Jongware May 18 '15 at 21:24
  • 4
    Go to your school cafeteria and look at the contraption that holds the plates ; the springy thing that pops up another plate when you take the top one. *That* is a stack. I have no idea who planted the coke bottle in your head. – WhozCraig May 18 '15 at 21:26
  • @Jongware Yes, it was intentional. I thought the EBP was just some type of indicator of which frame is the current frame, which is why I wrote EBP 1, EBP 2, etc. When you make a call, isn't the function that is called, the new EBP? – Austin Copeland May 18 '15 at 21:30
  • @WhozCraig I saw a model where inside a rectangle, there were segments called "Code / Instructions," "Stack," "Global," and "Heap." I thought all of this was happening inside the Stack Segment. – Austin Copeland May 18 '15 at 21:41
  • 1
    Why not start at [wikipedia](https://en.wikipedia.org/wiki/Stack_(abstract_data_type))? That would actually be much better than what you seem to have in mind. The main point of confusion seems to be mixing the data structure _stack_ with the hardware stack of a CPU (it is similar for most modern CPUs, not just x86). Perhaps the wp-entry might clarify this. Note that the data structure is only different for efficinecy reason; it could very well be realized much like the hardware stack. Just read the article. – too honest for this site May 18 '15 at 21:42
  • @AustinCopeland: No. If you look at a disassembled function (or, preferably, to a lot of them), you'll notice there is *one* stack pointer (`esp`); and each function typically *saves* `ebp`, *modifies* it for its own (local) purpose, then *restores* it. [Just found a worked-out example question](http://stackoverflow.com/questions/14185381/ebp-esp-and-stack-frame-in-assembly-with-nasm). – Jongware May 18 '15 at 21:43
  • 2
    @AustinCopeland after you are done with the other links, if you are serious about learning X86, then read [**The Art of Assembly Language Programming**](https://courses.engr.illinois.edu/ece390/books/artofasm/artofasm.html). Really, read it. It leaves little out. While it is older, it is still 100% applicable to X86 & X86_64 with the only primary differences being the size of the registers, the extended register names (there are other differences, calling conventions, syscalls, etc.., but the fundamentals are the same) – David C. Rankin May 18 '15 at 22:03
  • "IP" is short for instruction pointer. WHich is exactly what it does: point to the next intruction the CPU will execute (the "E" before "IP" means "extended" - somewhat historical reasons). Similar, "SP" stands for "stack pointer" and points to the "top of stack": the element pushed to the stack last and which will be "poppen" (taken) from the stack first. – too honest for this site May 18 '15 at 22:07
  • @DavidC.Rankin: Just a sidenote: That link is interesting for an overview, but a bit inexact: A byte is not necessarily 8 bits (that would be an "octet", while a "word" is not clearly defined in size either (on ARM it is 32 bits, not sure if it is still 16 bits on ia32/x64 nowadays). - Just a sidenote! – too honest for this site May 18 '15 at 22:11
  • @DavidC.Rankin I'll check it out after I get this all in my head. Thank you. – Austin Copeland May 18 '15 at 22:15
  • Of all of the resources you can find out on the net, that is by far one of the best for allowing you to teach yourself the ins & outs of x86 at the assembly level. – David C. Rankin May 19 '15 at 00:05
  • @DavidC.Rankin: That is really sad. Back in the 80ies there were computer magazines which had courses on programming, assembler and all this. So much for "information at your fingertips.". How shall the youngers learn the basics then? Maybe I'm just getting old, but I still think this is a must for good developers (hardware and software). – too honest for this site May 19 '15 at 00:16
  • Also there is a data structure called a stack which is a single ended last in first out (LIFO) queue ... Sometimes implemented as a linked list, which makes me think you may be confusing the call stack with a stack structure... – Grady Player May 19 '15 at 00:44
  • 2
    @Olaf How correct you are. While there are still computer magazines around, they are all too full of ads to be worth a crap. Almost like your home phone, e-mail, the web, or countess other media that has been polluted to the point of worthlessness by marketing maggots. Why put out a good series of articles that take up space that you could use to hock the latest viagra ad or e-mail marketing firm. IBM-DeveloperWorks has paired back to a trickle of what it once was, DDJ, PCMag, etc. have all but vanished. Sad really. – David C. Rankin May 19 '15 at 02:02
  • @DavidC.Rankin: and obviously YouTube cannot cover this either by video-tutorials. No wonder, as most of this requires diagram and some text one should have available persistently until really in your mind. For magazins, I primarily thought about Byte or similar; just read the german issue of tech review, which seems to be just another marketing-spreader which does not even go 1mm (~1/25in) into deep. – too honest for this site May 19 '15 at 11:41

1 Answers1

8

Typically, a computer program uses four kinds of memory areas (also called sections or segments):

  • The text section: This contains the program code. It is reserved when the program is loaded by the operating system. This area is fixed and does not change while the program is running. This would better be called "code" section, but the name has historical reasons.
  • The data section: This contains variables of the program. It is reserved when the program is loaded and initialized to values defined by the programmer. These values can be altered by the program while it executes.
  • The stack: This is a dynamic area of memory. It is used to store data for function calls. It basically works by "pushing" values onto the stack and popping from the stack. This is also called "LIFO": last in first out. This is where local variables of a function reside. If a function complets, the data is removed from the stack and is lost (basically).
  • The heap: This is also a dynamic memory region. There are special function in the programming language which "allocate" (reserve) a piece of this area on request of the program. Another function is available to return this area to the heap if it is not required anymore. As the data is released explicitly, it can be used to store data which lives longer than just a function call (different from the stack).

The data for text and data section are stored in the program file (they can be found in Linux for example using objdump (add a . to the names). stack and heap are not stored anywhere in the file as they are allocated dynamically (on-demand) by the program itself.

Normally, after the program has been loaded, the memory area reamining is treated as a single large block where both, stack and heap are located. They start from opposite end of that area and grow towards each other. For most architectures the heap grows from low to high memory addresses (ascending) and the stack downwards (decending). If they ever intersect, the program has run out of memory. As this may happen undetected, the stack might corrupt (change foreign data) the heap or vice versa. This may result in any kind of errors, depending how/what data has changed. If the stack gets corrupted, this may result in the program going wild (this is actually one way a trojan might work). Modern operating systems, however should take measures to detect this situation before it becomes critical.

This is not only for x86, but also for most other CPU families and operating system, notably: ARM, x86, MIPS, MSP430 (microcontroller), AVR (microcontroller), Linux, Windows, OS-X, iOS, Android (which uses Linux OS), DOS. For microcontrollers, there is often no heap (all memory is allocated at run-time) and the stack may be organized a bit differently; this is also true for the ARM-based Cortex-M microcontrollers. But anyway, this is quite a special subject.


Disclaimer: This is very simplified, so please no comments like "how about bss, const, myspecialarea";-) . There also is not requirement from the C standard for these areas, specifically to use a heap or a stack. Indeed there are implementations which don't use either. Those are most times embedded systems with small (8 or 16 bit) MCUs or DSPs. Also modern architectures use CPU registers instead of the stack to pass parameters and keep local variables. Those are defined in the Application Binary Interface of the target platform.


For the stack, you might read the wikipedia article. Note the difference in implementation between the datatstructure "stack" and the "hardware stack" as implemented in a typical (micro)processor.

too honest for this site
  • 12,050
  • 4
  • 30
  • 52