Why does creating a pointer of a local variable require the procedure to allocate space on the stack?

Question

I was reading the third chapter of "Computer Systems: A Programmer’s Perspective." In the section "Local Storage on the Stack," the book says:

Most of the procedure examples we have seen so far did not require any local storage beyond what could be held in registers. At times, however, local data must be stored in memory. Common cases of this include these: The address operator ‘&’ is applied to a local variable, and hence we must be able to generate an address for it.

I don't understand the reason for this. Consider this example from the book:

long caller()
{
    long arg1 = 534;
    long arg2 = 1057;
    long sum = swap_add(&arg1, &arg2);
    long diff = arg1 - arg2;
    return sum * diff;
}

The function swap_add requires two pointer arguments, so the caller needs to allocate space on its stack for the addresses of the local variables arg1 and arg2. I understand that you cannot have a pointer to reference a register, but I don't understand the reason for that.

Why we can't store arg1 and arg2 in registers and use &arg1 and &arg2 to reference them? What is the consequence of doing so? The book focuses on x86-64, but I would love to know about other architectures as well.

What do you mean "the address of registers"? Registers don't _have_ addresses. To have a pointer to something, the something needs to be somewhere _that actually has an address_. I'd suggest coming at this as an assembly/opcode/instruction-set question, not a C question; after all, if you can't find a way to write assembly that does what you want, the compiler can't do it either. — Charles Duffy, Aug 25 '23 at 03:15
(Pointers refer to main memory; registers aren't _in_ main memory, they're in the CPU; if you were to propose having special address values that refer to registers... that would need to be designed into the CPU itself if you wanted it to be even _remotely_ efficient -- the other option is having 16 different variants of each opcode and code to JMP to the right one after comparing a pointer to your special values -- so this isn't something a compiler can realistically do without hardware support). — Charles Duffy, Aug 25 '23 at 03:26
*I understand that you cannot have a pointer to reference a register, but I don't understand the reason for that.* - Registers like RAX have register numbers (0 to 15), which are only usable when embedded into machine code. They're a separate address-space from memory, and no indirect addressing of register numbers is supported, only register-direct. — Peter Cordes, Aug 25 '23 at 03:55
So even in assembly language or machine code, there's no way you can do indirect access to a register (with a reg num in another register); if you have some stuff in regs that you need to index, you have to store them to memory and index that array. Or write self-modifying code which stuffs the 3 bits of the register number into the ModR/M byte and the high bit into a REX prefix, but that will perform extremely badly (full pipeline nuke on out-of-order-exec CPUs), so only something you'd do while JIT-compiling something once to run many times. — Peter Cordes, Aug 25 '23 at 03:56
"_so the caller needs to allocate space on its stack for the addresses of..._" The allocation for local variables to a function is done at compile time, not run time. References to "stack offset addresses" for variables are assigned by the compiler during compilation (eg: StackPointer + 4)... — Fe2O3, Aug 25 '23 at 04:19
"so the caller needs to allocate space on its stack for the addresses of the local variables arg1 and arg2." Actually, the space is needed for the variables `arg1` and `arg2`, not for their addresses. — Gerhardh, Aug 25 '23 at 06:54
I'd recommend studying the basics of an assembler language (ideally something simpler than x86, for example a low-end MCU or ye olde MC68k). Then it will become obvious why you can't take the address of registers in C. Because C is mostly just a somewhat thin abstraction layer on top of asm, the address-of operator in C is analogous to indirect addressing in assembler. — Lundin, Aug 25 '23 at 08:46
There has been systems where registers *did* have an address, like in the [Univac 1100 series](https://en.wikipedia.org/wiki/UNIVAC_1100/2200_series#Registers). Simplifies the instruction format in interesting ways. — BoP, Aug 25 '23 at 11:14
The HP/1000 had two 16-bit registers, A & B, and they also were aliased to memory locations, 0 and 1, respectively. Still, was not all that useful then, and, we see this feature being dropped from modern architectures, as it would likely complicate modern hardware for little value. — Erik Eidt, Aug 25 '23 at 16:57

Marco Bonelli · Answer 1 · 2023-08-25T03:58:26.973

Why we can't store arg1 and arg2 in registers and use &arg1 and &arg2 to reference them?

Because CPU registers are not stored in main memory and therefore don't have any memory address associated with them. You can see them as very fast specialized memory cells that live within the CPU itself. Therefore, if you need the address of any variable you have declared locally you will have to use stack storage instead of a register for that variable to have a memory address.

This is why in general whenever a compiler sees the & operator it needs to allocate the variable somewhere in memory. After the operation that uses the memory address is done, the value can be put back into a register.

If the compiler is smart enough and notices that the use of the & operator does not have side effects on other variables and that the address of the variable does not escape the current function, then it may also decide to keep the variable in a register and simplify the & operation, but that is a different story.

John Bollinger · Accepted Answer · 2023-08-25T03:30:56.920

Why does creating a pointer of a local variable require the procedure to allocate space on the stack?

The book is distinguishing between local variables that have storage allocated for them in main memory and those stored only in registers. This is not a distinction you can actually see in C, but there are a few C features that affect it. Chief among these is applying the unary & operator to obtain the address of a local variable.

The reason is simple: for the & operator to obtain an object's address, that object must have an address. Only objects with storage assigned to them in memory do. Or at least, that's the presumption on which the book is relying, and it is true on a wide variety of current and historical architectures.

Bear in mind, too, that among the main reasons for wanting the address of a local variable is to provide for non-local accesses to it. For example, to enable scanf() to store a value in it. But when another function has control, it can use the CPU's registers how it wants, with a few caveats, and for the most part, it is not aware of how other functions further up the call chain may be using them.

That does not mean that a function cannot hold a variable's value in a register for a time, or even most of the time, but if it is assigned to memory then there are times when that value must be read from or written to memory.

*This is not a distinction you can actually see in C, but there are a few C features that affect it.* - another example is the `register` keyword which still exists in C (unlike recent C++), and is only usable on variables that haven't had the `&` operator used on them. Another way to make the book's point is to say that all the variables they've looked at so far were candidates for `register long foo` (and modern compilers would treat them that way with optimization enabled, even without the keyword). — Peter Cordes, Aug 25 '23 at 03:49
There are so many awesome answers. It's too sad that I can only select one as the accepted answer. — John Smith, Aug 25 '23 at 05:16
Agreed, @PeterCordes. However, the `register` keyword is potentially misleading (which is why I didn't mention it explicitly) because it does not actually specify register allocation. It just prevents taking the address of so-qualified objects, and *hints* that accesses should be made as fast as possible. That's consistent with register allocation, but the choice is still up to the compiler, and the OP should understand that modern optimizing compilers are usually better than humans at choosing which variables to allocate in registers. — John Bollinger, Aug 25 '23 at 12:22

Peter Cordes · Answer 3 · 2023-08-25T19:23:12.543

I understand that you cannot have a pointer to reference a register, but I don't understand the reason for that.

Registers like RAX have register numbers (0 to 15 since x86-64 has 16 general-purpose registers), which are only usable when embedded into machine code. They're a separate address-space from memory, and no indirect addressing of register numbers is supported, only register-direct.

So even in assembly language or machine code, there's no way you can do indirect access to a register (with a reg num in another register); if you have some stuff in regs that you need to index, you have to store them to memory and index that array.

Or write self-modifying code which stuffs the 3 bits of the register number into the ModR/M byte and the high bit into a REX prefix, but that will perform extremely badly (full pipeline nuke on out-of-order-exec CPUs), so only something you'd do while JIT-compiling something once to run many times.

C pointers point into the memory address space, not I/O space (port numbers for x86 in / out instructions) and not register numbers. This is true across all mainstream architectures.

When someone says "pointer", without additional context they always mean memory address. With context, they might possibly mean I/O address, but even that would be unusual. "Pointer" never mean register number, although register numbers are sometimes called "addresses" in computer science when talking about a "3-address architecture". (e.g. add dst, src1, src2 vs. a 2-address architecture doing add dst, src as dst += src that can't non-destructively copy the result to a 3rd operand.

In a register machine (like x86-64 and all modern CPU ISAs), these "addresses" will usually be register numbers, although x86-64 allows one of the explicit addresses to be a memory operand, including with addressing-modes like [rip+rel32] not involving any normal registers, so you can directly address static data.

In an accumulator machine with only an accumulator, instructions would be things like add [mem] with the accumulator as an implicit destination, so the "address" would be a memory address, taking up probably 16 bits of space in the machine code for every instruction for a 1-operand accumulator machine. vs. a 3-address machine with 32 registers (like MIPS) taking up 15 bits in every instruction for three register numbers. (Most real 8-bit micros like 6502 and 8080 also had a couple other regs for indirect memory addressing and/temporaries, even though an accumulator was the only one that could be the destination of most math instructions).

x86-64 is a 2-address architecture for most things like classic integer instructions, having some compact 1-address instructions like inc. And 3-operand for FP/SIMD math with AVX extensions, otherwise 2-address with SSE2. Legacy x87 is a 1-operand register-stack design, modern x86-64 includes a whole tasting menu of ISA design choices, for better or worse, often worse :P)

So register numbers are "addresses" in this sense, but not in the sense where you can "take their address" and get a pointer.

Fun fact: historical C/C++ have a register keyword which can only be applied to local vars whose address is never taken. Compilers now (and humans writing asm) don't need the hint (except in unoptimized debug builds), so your textbook is correctly treating all local vars which haven't had their address taken as implicitly register. They're pointing out that isn't possible for variables that do have their address taken, especially when the address is passed to another function that isn't getting inlined.

If it was inlining, it could optimize away the address-taking and dereferencing of a simple swap-by-reference function and just keep track of the fact that the C variable names are now associated with opposite registers, using zero asm instructions. (Or an xchg reg,reg if it was in a loop and unrolling didn't make the swap go away, or whatever other reason.)

But of course an optimizing compiler would also do constant propagation aka constant folding and just compile the function to mov eax, imm32 / ret, since the function used two locals initialized with constants, instead of two function args with runtime-variable values. If you want to actually look at compiler-generated asm, write int foo(int a, int b), not local constants, so you can enable optimization and still have code to look at. (See How to remove "noise" from GCC/clang assembly output?)

Or in this case, a non-inline function call would be sufficient. So just prototype swap_add but don't define it. Or if you also want to look at its asm, either give it a different name so the compiler doesn't know it's the function you wanted to call, or declare it with __attribute__((noinline,noipa)) for GCC or clang. (no IPA = no Inter-Procedural Analysis. Even when not inlining, GCC can still notice stuff about another function, like which registers it leaves unmodified, or if it doesn't do anything, and optimize callers accordingly, not treating it as a black box.

A few architectures (e.g. AVR, an 8-bit RISC microcontroller) have the storage space for their registers also accessible in memory address-space, i.e. their CPU registers are "memory mapped". Presumably that means they really do just use part of their onboard SRAM storage as a register file. (With extra read + write ports in that part of it I guess.) This would be a disaster for out-of-order execution or even aggressive pipelining where the CPU needs to detect "hazards", like dependencies between instructions. Comparing 4-bit numbers that are only ever hard-coded into machine instructions is way easier than having loads and stores also potentially reading and writing register values.

This is as far as the cross-over between registers and memory gets: even though AVR registers are accessible through memory addresses, the pointer is a memory address. You deref it with normal load/store instructions.

And even on architectures where registers have memory addresses, see Joshua's answer - the code in some function you're calling might save/restore those registers around some other use for them. In that sense, registers are like global variables that each function uses at different times. If you follow a calling convention correctly, they work as private local variables. But if you started taking their address and passing it to other functions, their reuse across functions would be a problem. So pointers to the memory space occupied by registers is only useful in asm where you're taking this into account; C compilers for AVR and other such ISAs can't use those addresses when taking the address of a local variable and passing to other functions.

AVR's General Purpose Registers (GPRs) donot live in RAM. Old AVRs (non-Xmega cores) mappes a part of the generic address space (the first 32 addresses) to GPGs. IMO taking their address in C/C++ und using them is Undefined Behaviour. Apart from that, in new AVR devices, GPGs are no more mapped to any address space and I/O address range starts at 0x0. — emacs drives me nuts, Aug 26 '23 at 09:41
@emacsdrivesmenuts: Thanks, I hadn't realized that had changed. And yeah, I think my (and Joshua's) answer make the pointer that C++ compilers can't generate code that uses those addresses. Or if you did it manually via inline asm or `unsigned char *regptr = 0x3` or something, then yeah it would be UB. — Peter Cordes, Aug 26 '23 at 12:24

Joshua · Answer 4 · 2023-08-25T19:33:56.560

In the bygone days there were such things as memory mapped registers. Let's say we had them. Here is an architecture with memory-mapped registers, the addresses of which can be obtained by &rn in the assembler. This architecture is defined to have 8 registers (r0 through r7), of which r0 is the return value register, r7 is the stack pointer, and the calling convention defines all other registers as callee-saved.

Let's see what happens when we try to pass addresses to them to other functions. The code would look something like this:

_caller:
    push r1
    push r2
    mov  r1, 534
    mov  r2, 1057
    push &r1
    push &r2
    call swap_add
    add  r7, 16
    sub  r1, r2
    mul  r0, r1
    pop  r2
    pop  r1
    ret

Which looks ok but doesn't work. The problem is swap_add looks something like this:

_swap_add:
   push r1
   push r2
   push r3
   mov  r1, [r7 + 8 + 24]
   mov  r2, [r7 + 16 + 24]
   mov  r0, [r1]
   mov  r3, [r2]
   mov  [r1], r3
   mov  [r2], r2
   add  r0, r3
   pop  r3
   pop  r2
   pop  r1
   ret

r1 and r2 got overwritten. Registers are global variables (really, thread local variables) that get used as scratch values in every function.

If you could take the address of a register, you could not pass that address to another function. Your code would not work because that function would use that register for its own purpose.

Why does creating a pointer of a local variable require the procedure to allocate space on the stack?

4 Answers4