What information do C variables contain under the hood?

Question

I'm working through K&R C and this line stood out to me:

A pointer is a variable that contains the address of a variable.

I always assumed (perhaps incorrectly) that a variable under the hood must contain a name, type, and the address of some location in memory. Ie: while variables can be treated as values, the compiler must know where those values are stored in memory, so variables must also be pointers (conceptually, not formally).

But now I'm not so sure. The text seems to imply that variables are somehow more fundamental than pointers.

What are variables, really? Are they like pointers under the hood, or are they different in some way? Specifically looking to understand this in the context of how memory is allocated.

EDIT: For those engaging in semantic debates... I am interested in understanding the _average_ use case, not what the standard does or doesn't specify, though I should have specified. For functional purposes, let's say C compiled with gcc or similar on a unix machine. Thanks!

A variable is simply a name that allows you to access and interpret an area of memory. When you write `int a;`, you're asking the compiler to allocate `sizeof(int)` space to you and allow you to reference it with `a`. A pointer is simply a different type of variable. — Ken White, Aug 02 '18 at 16:28
The processor doesn't know that much about a variable. It is an *offset* at runtime, relative from the data section or the stack pointer, depending how it was declared. No name. No explicit type, it is implicit from the processor instruction that accesses it. Also where the distinction between a variable and a pointer comes up, different instructions to dereference the pointer vs accessing the variable. The C compiler generates those different instructions, based on the declaration and the C code. — Hans Passant, Aug 02 '18 at 16:33
The standard does not define **what** a variable is, just how they are to be manipulated by the code. Maybe they are little green women, who knows. — too honest for this site, Aug 02 '18 at 16:34
@HansPassant -- but at _compile time_ the compiler must keep track of types in order to issue the required diagnostics for constraint violations. — ad absurdum, Aug 02 '18 at 16:36
Looks like we have different understanding on how deep under the hood we want to get. — Eugene Sh., Aug 02 '18 at 16:37
@EugeneSh.: How could you go any deeper than what the standard specifes without information about a specific implemention? This is already too broad, adding possible implementations just makes it even more off-topic. — too honest for this site, Aug 02 '18 at 16:45
@toohonestforthissite One could understand "under the hood" as "how it is implemented". — Eugene Sh., Aug 02 '18 at 16:49
@EugeneSh.: I think I already covered that: "The standard does not define what a variable is …". This implies the implementation. — too honest for this site, Aug 02 '18 at 16:59
Either the citation is wrongly cited, or it's another flaw of the book. It's definitively wrong. Not only for function pointers, but also data pointers. K&R C is completely outdated anyways. The 1st revision since ca. 30, the 2nd rev. since ca. 20 years. If you go that deep, start reading the standard. — too honest for this site, Aug 02 '18 at 17:02
C is a fully-compiled language. All the "under the hood" stuff is known to the compiler at compile time, but is thrown away in the final running code. — Lee Daniel Crocker, Aug 02 '18 at 19:08
@LeeDanielCrocker: Could you provide a reference to the standard disallowing (explicitly or implicitly) interpreting C source code? — too honest for this site, Aug 02 '18 at 19:36
Nothing prevents C code from being interpreted is someone wanted to do that other than the fact that it would be very difficult. I'm not aware of any such implementation — Lee Daniel Crocker, Aug 02 '18 at 20:12
@LeeDanielCrocker: I doubt it's more complicatied than a Java or Python interpreter or any other imperative programming language. Btw., there have been C intepreters (not sure if they are still maintained, though) and the more sophisticated static code analysers are in fact interpreters, at least to some extend. — too honest for this site, Aug 02 '18 at 20:43
@HansPassant that's exactly what I was looking to understand, thank you! — Kyle Chadha, Aug 03 '18 at 11:57
To whoever downvoted my question, and pretty much every answer, can I ask why? Perhaps I can pose questions in a better way, in the future — Kyle Chadha, Aug 03 '18 at 11:58

John Bode · Accepted Answer · 2018-08-03T16:15:54.183

What exactly constitutes a "variable" differs from language to language. It also matters what kind of a runtime environment is used - native binary (C/C++/Fortran/Cobol/Pascal), bytecode in a virtual machine (Java/C#/Scala/F#), a source-level interpreter (old-skool BASIC, bash/csh/sh), etc.

In the case of C, a variable is simply a chunk of memory large enough to hold the value of the specified type - there is no metadata associated with that memory chunk that tells you anything about its name (which typically isn't preserved in the machine code), its type, whether it's part of an array or not, etc. IOW, if you examined an integer variable in memory in a running program, all you'd see is the value stored in that integer. You wouldn't see any other information stored about that variable.

During translation (i.e., while the code is being compiled), the compiler maintains an internal table that keeps track of variables, variable names, types, scope, visibility, etc. However, none of that information (usually) makes it into the generated machine code. auto (local) variables are typically referred to by an offset from given stack address. static variables typically have a fixed address. Values of different types are dealt with by using different machine code instructions (for example, there are usually separate instructions for dealing with integers vs. floats).

A pointer variable simply stores an address. The exact format of that address will vary based on the system, but on modern x86 and similar systems, it's essentially an unsigned integer value. On a segmented memory system, it may be a pair of values (page # and offset).

EDIT

C code is typically compiled into a native binary (although there's at least one compiler that targets the Java VM, and there may be compilers that target other virtual machines). On an x86-like system, a running native binary is typically laid out like this in (virtual!) memory:

              +-------------------------+
High address: | Environmental variables |
              | and command line args   |
              +-------------------------+
              |        Stack            |
              |          |              |
              |          V              |
              |          ^              |
              |          |              |
              |         Heap            |
              +-------------------------+
              | Read-only data items    |
              +-------------------------+
              | Global data items       |
              +-------------------------+
              | Program text (machine   |
 Low address: | code)                   |
              +-------------------------+

The exact details vary from system to system, but this is a decent overall view.

Each time a function is called (including main), memory is taken from the stack to build what is called a stack frame. The stack frame contains space for the function arguments (if any), local variables (if any), address of the previous stack frame, and the address of the next instruction to execute after the function returns.

              +--------------------+
High address: | Function arguments |
              +--------------------+
              | Return address     |
              +--------------------+
              | Prev frame address | <-- %rbp/%ebp (frame pointer)
              +--------------------+
 Low address: | Local variables    | <-- %rsp/%esp (stack pointer)
              +--------------------+

The %rsp (64-bit) or %esp (32-bit) register stores the address of the top of the stack (on x86, the stack grows "down" towards decreasing addresses), and the %rbp (64-bit) or %ebp (32-bit) register stores the address of the stack frame. Function arguments and local variables are referred to via offsets from the frame pointer, such as

-4(%rpb) -- object starting 4 bytes "below" current frame address
32(%rbp) -- object starting 32 bytes "above" current frame address

Here's an example - we have a function foo that takes two int arguments and has two int local variables:

#include  <stdio.h>

void foo( int x, int y )
{
  int a;
  int b;

  a = 2 * x + y;
  b = x - y;

  printf( "x = %d, y = %d, a = %d, b = %d\n", x, y, a, b );

}

Here's the generated assembly for that function (MacOS 10.13, LLVM version 9.1.0):

        .section        __TEXT,__text,regular,pure_instructions
        .macosx_version_min 10, 13
        .globl  _foo                    ## -- Begin function foo
        .p2align        4, 0x90
_foo:                                   ## @foo
        .cfi_startproc
## BB#0:
        pushl   %ebp
Lcfi0:
        .cfi_def_cfa_offset 8
Lcfi1:
        .cfi_offset %ebp, -8
        movl    %esp, %ebp
Lcfi2:
        .cfi_def_cfa_register %ebp
        pushl   %ebx
        pushl   %edi
        pushl   %esi
        subl    $60, %esp
Lcfi3:
        .cfi_offset %esi, -20
Lcfi4:
        .cfi_offset %edi, -16
Lcfi5:
        .cfi_offset %ebx, -12
        calll   L0$pb
L0$pb:
        popl    %eax
        movl    12(%ebp), %ecx
        movl    8(%ebp), %edx
        leal    L_.str-L0$pb(%eax), %eax
        movl    8(%ebp), %esi
        shll    $1, %esi
        addl    12(%ebp), %esi
        movl    %esi, -16(%ebp)
        movl    8(%ebp), %esi
        subl    12(%ebp), %esi
        movl    %esi, -20(%ebp)
        movl    8(%ebp), %esi
        movl    12(%ebp), %edi
        movl    -16(%ebp), %ebx
        movl    %eax, -24(%ebp)         ## 4-byte Spill
        movl    -20(%ebp), %eax
        movl    %eax, -28(%ebp)         ## 4-byte Spill
        movl    -24(%ebp), %eax         ## 4-byte Reload
        movl    %eax, (%esp)
        movl    %esi, 4(%esp)
        movl    %edi, 8(%esp)
        movl    %ebx, 12(%esp)
        movl    -28(%ebp), %esi         ## 4-byte Reload
        movl    %esi, 16(%esp)
        movl    %edx, -32(%ebp)         ## 4-byte Spill
        movl    %ecx, -36(%ebp)         ## 4-byte Spill
        calll   _printf
        movl    %eax, -40(%ebp)         ## 4-byte Spill
        addl    $60, %esp
        popl    %esi
        popl    %edi
        popl    %ebx
        popl    %ebp
        retl
        .cfi_endproc
                                        ## -- End function
        .section        __TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
        .asciz  "x = %d, y = %d, a = %d, b = %d\n"


.subsections_via_symbols

Here's what our stack frame will look like:

              +---+
High address: | y |
              +---+
              | x |
              +---+
              |   | return address
              +---+
              |   | address of previous frame
              +---+
              | a |
              +---+
              | b |
              +---+

Now, that's how things look in 32-bit world. 64-bit gets a little more complicated - some function arguments are passed in registers rather than on the stack, so the nice neat picture above breaks down.

Now, I'm talking about the concept of a variable at runtime, which is what I think you were asking about.

Ah, so I was missing the context of how a variable is translated and incorrectly thinking of them as existing at runtime in some way. It is not that variables are stored somewhere in memory, but they are directly replaced in the program text by the offset they refer to. Their type is encoded only by the machine code instructions required to manipulate them. While a pointer is variable which stores an actual address, and does exist at runtime. Would you say that is an accurate summary? — Kyle Chadha, Aug 03 '18 at 12:09
@KyleChadha: Variables *do* exist at runtime, as regions of memory into which values are stored. Those regions are typically referred to via an address or and offset from an address. What may or may not exist at runtime is metadata about the variable, and that depends on the language or runtime environment. I made a hash of that explanation and need to rework it. Look for an edit later today. — John Bode, Aug 03 '18 at 14:19

score 0 · Answer 2 · answered Aug 02 '18 at 16:41

I always assumed (perhaps incorrectly) that a variable under the hood must contain a name, type, and the address of some location in memory.

This is wrong, at least for C11. The definitive reference is the standard specification, e.g. n1570 (actually a late draft identical to the ISO standard)

In practice, a variable is usually some memory location. It has some value, but the name and type is forgotten at runtime. Only the compiler knows about the name and type of the variable. The compiler may sometimes (under the as-if rule) forget a variable if it is optimizing.

A pointer don't refer to a variable, but to a memory location.

Read also about undefined behavior.

You mentioned the C11 standard. That doesn't even define what "variable" means. (Also, it does define "memory location" in 3.14, but not in the way that the Wikipedia "memory address" article does. A memory location is the memory used to store a single scalar object, not the address of that memory.) — Mike Housky, Aug 02 '18 at 19:24

score 0 · Answer 3 · answered Aug 02 '18 at 16:44

variable is C abstract thing and has only the name in the source code (It is placed as well in the object file (if the compiler generates them) - but it outside the scope of this deliberations). In the compiled (and potentially linked) executable file there are no variables - there are only some locations in the memory or registers, which are manipulated by the machine code instructions.

Variable is an language abstraction, and they do not exist outside the source code scope. SIn the source code variable has the name and type. In the executable file variables as we understand it in the C language do not exist.

Helpful to think of a variable as an abstraction that exists only source code -- for all functional purposes then, variables are values, while pointers are addresses to values -- thanks! — Kyle Chadha, Aug 03 '18 at 12:11

Alex Johnson · Answer 4 · 2018-08-02T16:40:54.863

-1

A variable is a symbolic representation of a particular location in memory.

That location holds the type and value that is associated with the variable. The data that is stored in that memory location can change, hence the term variable.

And as pointed out above, a pointer is a variable in which the type is pointer and the value is an address in memory.

edited Aug 02 '18 at 16:40

answered Aug 02 '18 at 16:26

Alex Johnson

958
8
23

So, a function name is a variable? – Sourav Ghosh Aug 02 '18 at 16:29
2

@SouravGhosh that's bad logic; a car is a vehicle does not mean that a vehicle is a car. – davmac Aug 02 '18 at 16:35
1

That's wrong. A variable is not a name at all. A property of a thing is not the thing. – too honest for this site Aug 02 '18 at 16:35
2

@davmac sorry, where was it written that a variable is a function name? (going by your argument....) – Sourav Ghosh Aug 02 '18 at 16:36
1

@SouravGhosh What does this have to do with function names? – Alex Johnson Aug 02 '18 at 17:01
@SouravGhosh if that's not implied by this answer, how is your comment relevant - where or how does the answer imply that "a function name is a variable"? (I don't think you're "going by my argument", rather, you've misunderstood it). I see "a particular location in memory" as _potentially_ describing a function - what does your comment mean, if not referring to that? – davmac Aug 03 '18 at 08:54

score -1 · Answer 5 · answered Aug 02 '18 at 16:36

A variable represents an object that contains some data and has a certain type. The data is always stored physically in binary forms in memory. The type of the object decides how the data is interpreted.

So a pointer is such an object that the data it contains is an address in the memory, and the type is "a pointer to another object".

The object that a variable refers to is also a solid object, so it has properties like size and location (memory addr.) of its own. This does not interfere with what it contains (what its "data" is).

Consider a live example. A road sign may points to another location, but the sign itself is located somewhere. The two locations do not interfere and can even be the same (though usually no one does that).

Sourav Ghosh · Answer 6 · 2018-08-02T16:49:42.003

Copying blatantly from the wikipedia article, (emphasis mine)

In computer programming, a variable or scalar is a storage location (identified by a memory address) paired with an associated symbolic name (an identifier), which contains some known or unknown quantity of information referred to as a value. The variable name is the usual way to reference the stored value, in addition to referring to the variable itself, depending on the context. This separation of name and content allows the name to be used independently of the exact information it represents. The identifier in computer source code can be bound to a value during run time, and the value of the variable may thus change during the course of program execution. [....]

Then, going forward

... so variables must also be pointers ...

Well, not really. Just because, variables are associated with an address, does not make them of pointer type.

To add, a pointer type, can also be a variable - basically a variable which holds the address for another type (of variable or constant object).

To present is graphically, let's consider two variables,

int a - a variable named a of type int, (assume integer size is 4 bytes)
int *ptr - a variable named ptr of type int *, (assume pointer size is 4 bytes)

and we say, ptr = &a;

As above:

 +--------------------+
 |                    |
 |                    |
 +--------------------+

 8000       a         8003

 // Here you have four bytes, ranging from 8000 to 8003, indicated by variable `a`

 +----------------------+
 |                      |
 |                      |
 +----------------------+

9000        ptr         9003

 // Here you have four bytes, ranging from 9000 to 9003, indicated by variable `ptr`
 // next, as we say ptr = &a;

 +----------------------+
 |                      |
 |         8000         |
 +----------------------+

9000        ptr         9003

//ptr now holds the address of variable `a`.
// however, ptr still has it's own address, as it itself, is a variable.

A subtle point. Variables are not necessarily stored in memory. They can be stored in register (well, it *is* memory in the broad sense..). Or not stored at all.... — Eugene Sh., Aug 02 '18 at 16:47
@EugeneSh. but it has an address, nonetheless (to be accesseed). Whether we can use the unary `&` or not, well, you already covered that. :) — Sourav Ghosh, Aug 02 '18 at 16:55

score -1 · Answer 7 · answered Aug 02 '18 at 16:45

-1

Suppose you have these variables:

int x = 1234;
void f(void) {
    int y = 4567;
}

The compiler discards the names x and y and the type int and just remembers the addresses of the variables (it can keep names in a symbol map for debugging, but these are not needed otherwise).

Static variables like x have a fixed address, so when your code does something with x you are telling the compiler to do something with the value held at this fixed address.

Automatic variables like y are often held in a register. The compiler generates code to look at the value of that register. Or the compiler might store it in f()'s stack frame, so it will generate code that looks at the content of an address that is offset from that stack frame address.

A variable that is a pointer works in the same way, except that instead of storing a value like an int it stores the address of another variable.

answered Aug 02 '18 at 16:45

James

5,635
2
33
44

There are neither registers nor a stack in C, not even compiler is mandatory or where names are dropped or addresses used. – too honest for this site Aug 02 '18 at 16:46
I didn't say there were registers. I just said what compilers typically do. With C being so close to the bare metal if is often helpful to understand what the bare metal really is (in most if not all cases). Do you know of C any compiler that does not use a stack or use the CPU's registers? – James Aug 02 '18 at 16:51
1

1) You are conflating standard and a **specific** implementation you have in mind. And yes, there there are implementations which don't _necessarily_ use a/the stack. One of them is for ARM, others are for certain bare-metal targets you interestingly mentioned. In general from mixing abstraction levels never gfood explanations come. OP needs to clarify. – too honest for this site Aug 02 '18 at 16:56

What information do C variables contain under the hood?

7 Answers7

Linked