What exactly constitutes a "variable" differs from language to language. It also matters what kind of a runtime environment is used - native binary (C/C++/Fortran/Cobol/Pascal), bytecode in a virtual machine (Java/C#/Scala/F#), a source-level interpreter (old-skool BASIC, bash/csh/sh), etc.
In the case of C, a variable is simply a chunk of memory large enough to hold the value of the specified type - there is no metadata associated with that memory chunk that tells you anything about its name (which typically isn't preserved in the machine code), its type, whether it's part of an array or not, etc. IOW, if you examined an integer variable in memory in a running program, all you'd see is the value stored in that integer. You wouldn't see any other information stored about that variable.
During translation (i.e., while the code is being compiled), the compiler maintains an internal table that keeps track of variables, variable names, types, scope, visibility, etc. However, none of that information (usually) makes it into the generated machine code. auto
(local) variables are typically referred to by an offset from given stack address. static
variables typically have a fixed address. Values of different types are dealt with by using different machine code instructions (for example, there are usually separate instructions for dealing with integers vs. floats).
A pointer variable simply stores an address. The exact format of that address will vary based on the system, but on modern x86 and similar systems, it's essentially an unsigned integer value. On a segmented memory system, it may be a pair of values (page # and offset).
EDIT
C code is typically compiled into a native binary (although there's at least one compiler that targets the Java VM, and there may be compilers that target other virtual machines). On an x86-like system, a running native binary is typically laid out like this in (virtual!) memory:
+-------------------------+
High address: | Environmental variables |
| and command line args |
+-------------------------+
| Stack |
| | |
| V |
| ^ |
| | |
| Heap |
+-------------------------+
| Read-only data items |
+-------------------------+
| Global data items |
+-------------------------+
| Program text (machine |
Low address: | code) |
+-------------------------+
The exact details vary from system to system, but this is a decent overall view.
Each time a function is called (including main
), memory is taken from the stack to build what is called a stack frame. The stack frame contains space for the function arguments (if any), local variables (if any), address of the previous stack frame, and the address of the next instruction to execute after the function returns.
+--------------------+
High address: | Function arguments |
+--------------------+
| Return address |
+--------------------+
| Prev frame address | <-- %rbp/%ebp (frame pointer)
+--------------------+
Low address: | Local variables | <-- %rsp/%esp (stack pointer)
+--------------------+
The %rsp
(64-bit) or %esp
(32-bit) register stores the address of the top of the stack (on x86, the stack grows "down" towards decreasing addresses), and the %rbp
(64-bit) or %ebp
(32-bit) register stores the address of the stack frame. Function arguments and local variables are referred to via offsets from the frame pointer, such as
-4(%rpb) -- object starting 4 bytes "below" current frame address
32(%rbp) -- object starting 32 bytes "above" current frame address
Here's an example - we have a function foo
that takes two int
arguments and has two int
local variables:
#include <stdio.h>
void foo( int x, int y )
{
int a;
int b;
a = 2 * x + y;
b = x - y;
printf( "x = %d, y = %d, a = %d, b = %d\n", x, y, a, b );
}
Here's the generated assembly for that function (MacOS 10.13, LLVM version 9.1.0):
.section __TEXT,__text,regular,pure_instructions
.macosx_version_min 10, 13
.globl _foo ## -- Begin function foo
.p2align 4, 0x90
_foo: ## @foo
.cfi_startproc
## BB#0:
pushl %ebp
Lcfi0:
.cfi_def_cfa_offset 8
Lcfi1:
.cfi_offset %ebp, -8
movl %esp, %ebp
Lcfi2:
.cfi_def_cfa_register %ebp
pushl %ebx
pushl %edi
pushl %esi
subl $60, %esp
Lcfi3:
.cfi_offset %esi, -20
Lcfi4:
.cfi_offset %edi, -16
Lcfi5:
.cfi_offset %ebx, -12
calll L0$pb
L0$pb:
popl %eax
movl 12(%ebp), %ecx
movl 8(%ebp), %edx
leal L_.str-L0$pb(%eax), %eax
movl 8(%ebp), %esi
shll $1, %esi
addl 12(%ebp), %esi
movl %esi, -16(%ebp)
movl 8(%ebp), %esi
subl 12(%ebp), %esi
movl %esi, -20(%ebp)
movl 8(%ebp), %esi
movl 12(%ebp), %edi
movl -16(%ebp), %ebx
movl %eax, -24(%ebp) ## 4-byte Spill
movl -20(%ebp), %eax
movl %eax, -28(%ebp) ## 4-byte Spill
movl -24(%ebp), %eax ## 4-byte Reload
movl %eax, (%esp)
movl %esi, 4(%esp)
movl %edi, 8(%esp)
movl %ebx, 12(%esp)
movl -28(%ebp), %esi ## 4-byte Reload
movl %esi, 16(%esp)
movl %edx, -32(%ebp) ## 4-byte Spill
movl %ecx, -36(%ebp) ## 4-byte Spill
calll _printf
movl %eax, -40(%ebp) ## 4-byte Spill
addl $60, %esp
popl %esi
popl %edi
popl %ebx
popl %ebp
retl
.cfi_endproc
## -- End function
.section __TEXT,__cstring,cstring_literals
L_.str: ## @.str
.asciz "x = %d, y = %d, a = %d, b = %d\n"
.subsections_via_symbols
Here's what our stack frame will look like:
+---+
High address: | y |
+---+
| x |
+---+
| | return address
+---+
| | address of previous frame
+---+
| a |
+---+
| b |
+---+
Now, that's how things look in 32-bit world. 64-bit gets a little more complicated - some function arguments are passed in registers rather than on the stack, so the nice neat picture above breaks down.
Now, I'm talking about the concept of a variable at runtime, which is what I think you were asking about.