5

As far as I know, when a process allocates local variables, it does so by pushing them onto memory as a stack, but still accesses them as random memory by using an offset from the stack pointer to reference them (from this thread What is the idea behind using a stack for local variables?).

However, how does it know which variables have what offset? Am I thinking about this in the right way?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • [This](https://cs.stackexchange.com/questions/76871/how-are-variables-stored-in-and-retrieved-from-the-program-stack) might be helpful. – Eli Sadoff Feb 14 '18 at 18:56
  • It's not the process itself, the compiler determines which variables are to be stored in the stack, and which are not. The compiler may not even use the stack at all. – Pablo Feb 14 '18 at 18:57
  • Tagged [tag:assembly] because ISO C doesn't require implementations to use a stack to implement automatic storage; using the call-stack for locals is an implementation detail common to all mainstream C implementations on "normal" register-machine CPU architectures, but not part of the C language itself. (Using a separate data stack would also work easily (C setjmp/longjmp semantics are stack-like), but tie up another register, for the benefit of making return-address overwriting impossible with buffer overflows) – Peter Cordes Feb 14 '18 at 22:29

3 Answers3

6

Offsets of local variables are "baked into" the machine code as constants. By the time the compiler is done, things that your program referred to as local variables are replaced with fixed memory offsets assigned by compiler.

Let's say you declare three local variables:

char a[8];
int b;
short c;

The compiler assigns offsets to these variables: a is at offset 0, b is at offset 8, and c is at offset 12. Let's say your code does b += c. Compiler translates this into a block of code that looks like this:

LOAD    @(SP+8)
ADD     @(SP+12)
STORE   @(SP+8)

The only value that changes here is SP (stack pointer). All offsets are numeric constants.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • So for the duration of the function (or local scope in general) where the local variables are being used, `SP` will be the same, and to the processor `a` is essentially just `SP`, `b` is `@(SP+8)` and `c` is `@(SP+12)`? – Christian Bouwense Feb 14 '18 at 19:12
  • 1
    That is correct. Of course, the exact numerical constants depend on the sizes and alignment requirements of the variables. The example is for a platform where the size of an `int` is 4 bytes. – kfx Feb 14 '18 at 19:15
  • 2
    @ChristianBouwense That's right, SP would change only when you call another function, at which point the current function is "on hold". Once the other function returns, `SP` is restored to its value before the call, so the current function can access its local variables at their current offsets again. – Sergey Kalinichenko Feb 14 '18 at 19:19
  • @ChristianBouwense Compiler is allowed to make things a little more complicated by letting variables share the same offset at different times. Let's say you declare two variables, `i` and `j`. Then you use `i` before the first use of `j`, and stop. After that you start using `j`, and never use `i` again. This is common when `i` and `j` are indexes of two back-to-back loops. In situations like that the compiler is allowed to assign `i` and `j` the same offset. – Sergey Kalinichenko Feb 14 '18 at 19:22
  • @ChristianBouwense: SP can change (e.g. while pushing args before a function call in a stack-args calling convention), but the compiler is in control of such changes and can always calculate the right offset to put in the machine code. You could see this on x86, where many of the 32-bit calling conventions don't pass any args in registers, so all calls to non-void functions push some args (or reserve space ahead of time and simply store to space right above the stack pointer). See both ways for `gcc -O3` for 32-bit x86: https://godbolt.org/g/tbxzNY. – Peter Cordes Feb 14 '18 at 22:43
2

Preface: The following text uses the x86 architecture as example. Other architectures do handle things differently.

[...] it does so by pushing them into memory as a stack, [...]

That's close. it does so by pushing them into memory ON THE stack [of the current process]. Every process has its own stack. Therefore with every context switch this Stack Frame does change - and so do its local variables (on the stack).

Usually(!) locally defined variables are referenced relative to the Stack Frame saved and present in the EBP register. This happens in contrast to globally defined varables which are referenced relative to the Data Segment Base. So every process does have its own stack with its own local variables.

Newer compilers can spare the register EBP and reference the variables relative to the ESP register. This has two consequences:

  • one register more available to use
  • one possibility less for debugging (debugging often used the EBP value as reference for the current Stack Frame to identify local variables). So this makes debugging harder without a separate debugging information file.

So to answer your main question

How does a process keep track of its local variables

Processes keep track of their Stack Frame (which contains the Local Variables), but not of their Local Variables themselves. And the Stack Frame changes with each Process Switch. The Local Variables are merely referenced relative to the Stack Frame Pointer kept in the register EBP (or relative to the Stack Pointer ESP, which depends on the compiler settings).

zx485
  • 28,498
  • 28
  • 50
  • 59
  • The question didn't mention x86. You should probably say "on x86 for example" in there somewhere before the x86-specific stuff. Many RISC machines don't have any convention of using a base pointer. e.g. MIPS doesn't have a `push` instruction, only load/store, and no disadvantage to referencing locals relative to the stack pointer vs. any other register. – Peter Cordes Feb 14 '18 at 22:46
1

Compiler does the job in memorizing the offsets. These offsets are simply hardcoded. Like to load the variable to register (eg. to eax) compiler would produce something like mov eax, [esp-4], where esp is stack pointer register and 4 is the offset. If new variable will be pushed next mov to get/set variable will have bigger offset. All this is compilation time analysis.

Also, the stack on some platform may be reversed - so offset will be positive.

  • Almost all platforms have stacks that grow down, including x86, so your example should be `mov eax, [esp+4]`. Only the System V x86-64 ABI has a red-zone where you can safely put locals *below* `rsp` in leaf functions, even in the general case when you have signal handlers installed. (And in that case, you'd use `[rsp-4]`. Although gcc did have missed optimizations when compiling for the x32 ABI (32-bit pointers in long mode) where it would use the address-size prefix even for accessing the stack, so you could actually get some gcc versions to emit `mov eax, [esp-4]` with the right options) – Peter Cordes Feb 15 '18 at 03:15