72

In C, let's say you have a variable called variable_name. Let's say it's located at 0xaaaaaaaa, and at that memory address, you have the integer 123. So in other words, variable_name contains 123.

I'm looking for clarification around the phrasing "variable_name is located at 0xaaaaaaaa". How does the compiler recognize that the string "variable_name" is associated with that particular memory address? Is the string "variable_name" stored somewhere in memory? Does the compiler just substitute variable_name for 0xaaaaaaaa whenever it sees it, and if so, wouldn't it have to use memory in order to make that substitution?

Peter O.
  • 32,158
  • 14
  • 82
  • 96
Tyler
  • 2,579
  • 2
  • 22
  • 32
  • Not quite an answer, but this might fill in some blanks: http://www.csee.umbc.edu/~chang/cs313.s02/stack.shtml – Douglas Jan 30 '13 at 19:47
  • Well, absent debug info, the variable name is not stored in memory. If you want to understand this you first need to understand machine language and assembly language. – Hot Licks Feb 08 '13 at 02:02

5 Answers5

106

Variable names don't exist anymore after the compiler runs (barring special cases like exported globals in shared libraries or debug symbols). The entire act of compilation is intended to take those symbolic names and algorithms represented by your source code and turn them into native machine instructions. So yes, if you have a global variable_name, and compiler and linker decide to put it at 0xaaaaaaaa, then wherever it is used in the code, it will just be accessed via that address.

So to answer your literal questions:

How does the compiler recognize that the string "variable_name" is associated with that particular memory address?

The toolchain (compiler & linker) work together to assign a memory location for the variable. It's the compiler's job to keep track of all the references, and linker puts in the right addresses later.

Is the string "variable_name" stored somewhere in memory?

Only while the compiler is running.

Does the compiler just substitute variable_name for 0xaaaaaaaa whenever it sees it, and if so, wouldn't it have to use memory in order to make that substitution?

Yes, that's pretty much what happens, except it's a two-stage job with the linker. And yes, it uses memory, but it's the compiler's memory, not anything at runtime for your program.

An example might help you understand. Let's try out this program:

int x = 12;

int main(void)
{
    return x;
}

Pretty straightforward, right? OK. Let's take this program, and compile it and look at the disassembly:

$ cc -Wall -Werror -Wextra -O3    example.c   -o example
$ otool -tV example
example:
(__TEXT,__text) section
_main:
0000000100000f60    pushq   %rbp
0000000100000f61    movq    %rsp,%rbp
0000000100000f64    movl    0x00000096(%rip),%eax
0000000100000f6a    popq    %rbp
0000000100000f6b    ret

See that movl line? It's grabbing the global variable (in an instruction-pointer relative way, in this case). No more mention of x.

Now let's make it a bit more complicated and add a local variable:

int x = 12;

int main(void)
{  
    volatile int y = 4;
    return x + y;
}

The disassembly for this program is:

(__TEXT,__text) section
_main:
0000000100000f60    pushq   %rbp
0000000100000f61    movq    %rsp,%rbp
0000000100000f64    movl    $0x00000004,0xfc(%rbp)
0000000100000f6b    movl    0x0000008f(%rip),%eax
0000000100000f71    addl    0xfc(%rbp),%eax
0000000100000f74    popq    %rbp
0000000100000f75    ret

Now there are two movl instructions and an addl instruction. You can see that the first movl is initializing y, which it's decided will be on the stack (base pointer - 4). Then the next movl gets the global x into a register eax, and the addl adds y to that value. But as you can see, the literal x and y strings don't exist anymore. They were conveniences for you, the programmer, but the computer certainly doesn't care about them at execution time.

Carl Norum
  • 219,201
  • 40
  • 422
  • 469
  • 12
    Great explanation for how C does it. Note that some other languages (particularly modern "dynamic" or "scripting" languages) may maintain symbolic names for data, and they do indeed use memory at runtime to keep that mapping information around. – Russell Borogove Jan 30 '13 at 23:11
  • 3
    This is a great answer. Just one question: how does compiler knows the address before really running? I thought the memory is dynamically allocated (so the addresses might be different in different runs). – Jackson Tale May 28 '14 at 21:40
  • 2
    @JacksonTale - that depends a lot on how a particular system is configured, but on most common systems, virtual memory means your process always has the same logical view of memory even if the underlying physical addresses change from run to run. In the examples I used above, the variables are being addressed relative to the instruction pointer, not by an absolute address anyway. – Carl Norum May 29 '14 at 05:39
  • @CarlNorum So you mean, when in runtime, the physical real memory addresses will be mapped to the virtual memory addresses? – Jackson Tale May 29 '14 at 08:36
  • Yes, that's the point of virtual memory - to provide programs a uniform view of the memory system regardless upon the underlying reality. – Carl Norum May 29 '14 at 15:08
  • @JacksonTale this type of compiled code is also known as 'unmanaged' code where there's no runtime environment to make the conversion between identifiers and virtual addresses. So it's done at compile time. If you have a managed code like you have in Java or C#, I remember this conversion is generally done by the loaders at runtime but not necessarily (compiler or linker might also take care of it depending on the configuration) . – stdout Aug 11 '17 at 10:50
  • @zgulser For C#, the names of locals [are erased at compile-time](https://pastebin.com/pHeHRS8v). The names of methods themselves, and fields (basically the names of type members) are preserved for reflection and dynamic linking. I believe it's the same story with Java. – cdhowie Sep 18 '17 at 18:53
  • `0xfc` is `0x100 - 4`, so it seems quite related to the `-4`, but why `0xfc`, not `-0x4`? Could someone please give me any hint? – starriet Feb 22 '22 at 10:34
  • 1
    `0xfc` is the 8-bit two's complement representation of `-4`. – Carl Norum Feb 23 '22 at 17:41
14

A C compiler first creates a symbol table, which stores the relationship between the variable name and where it's located in memory. When compiling, it uses this table to replace all instances of the variable with a specific memory location, as others have stated. You can find a lot more on it on the Wikipedia page.

MichaelThiessen
  • 408
  • 2
  • 6
10

All variables are substituted by the compiler. First they are substituted with references and later the linker places addresses instead of references.

In other words. The variable names are not available anymore as soon as the compiler has run through

junix
  • 3,161
  • 13
  • 27
  • 1
    `First references..then addresses` is the important bit to know when compiling multiple files and linking later. +1 for that. – P.P Jan 30 '13 at 19:58
6

This is what's called an implementation detail. While what you describe is the case in all compilers I've ever used, it's not required to be the case. A C compiler could put every variable in a hashtable and look them up at runtime (or something like that) and in fact early JavaScript interpreters did exactly that (now, they do Just-In-TIme compilation that results in something much more raw.)

Specifically for common compilers like VC++, GCC, and LLVM: the compiler will generally assign a variable to a location in memory. Variables of global or static scope get a fixed address that doesn't change while the program is running, while variables within a function get a stack address-that is, an address relative to the current stack pointer, which changes every time a function is called. (This is an oversimplification.) Stack addresses become invalid as soon as the function returns, but have the benefit of having effectively zero overhead to use.

Once a variable has an address assigned to it, there is no further need for the name of the variable, so it is discarded. Depending on the kind of name, the name may be discarded at preprocess time (for macro names), compile time (for static and local variables/functions), and link time (for global variables/functions.) If a symbol is exported (made visible to other programs so they can access it), the name will usually remain somewhere in a "symbol table" which does take up a trivial amount of memory and disk space.

Jonathan Grynspan
  • 43,286
  • 8
  • 74
  • 104
4

Does the compiler just substitute variable_name for 0xaaaaaaaa whenever it sees it

Yes.

and if so, wouldn't it have to use memory in order to make that substitution?

Yes. But it's the compiler, after it compiled your code, why do you care about memory?

vanza
  • 9,715
  • 2
  • 31
  • 34