2

These are the direct lines from the Dennis Ritchie C programming book:

Declaration refers to the place the nature of the variable is stated but no storage is allocated. Definition refers to the place the variable is created or assigned storage.


Question

what exactly is the difference between "variable is created" and "variable is stated"?

In the following program variable j just declared, not defined, so how come there is no compilation error, as we are assigning the value of the address of j to I, and variable j has not yet been allocated the storage(during compilation)?

int main()
{
    int *I, j;
    I = &j; 
    printf("%lu", I);
}

output: 140721741346812 

int *I, j is declaration or definition? What I'm getting is, it is a definition. but in this link it's stated as declaration as well as definition.

I'm confused by the two distinct concepts given in the link and in Ritchie's book.

Please state where am I going wrong? Is it that value assignment is postponed till run-time, and at run-time variable j is assigned as garbage value, and hence no error?

Student
  • 805
  • 1
  • 8
  • 11
aambazinga
  • 53
  • 7
  • Sidenote: `printf("%lu", pointer);` is undefined behaviour, you need to print pointers with `%p` format specifier. Actually, on x64 windows, not even type size does match (unsigned long: 4, pointer: 8). – Aconcagua Jul 28 '18 at 12:58
  • `int i;` already allocates memory as well, even without assigning a value. An example for only declaring a variable (global this time) would be `extern int i;` – Aconcagua Jul 28 '18 at 13:00
  • In standard [C11 §6.7 Declarations ¶5](https://port70.net/~nsz/c/c11/n1570.html#6.7p5), it says: _A declaration specifies the interpretation and attributes of a set of identifiers. A_ definition _of an identifier is a declaration for that identifier that:_ and goes on to enumerate 4 features, any one of which makes a specific declaration into a definition. Thus, **all definitions are declarations; some declarations are not definitions**. – Jonathan Leffler Jul 28 '18 at 19:15
  • @Jonathan Letter this question is not duplicate of the question you referred to. That question is directly asking the definition of the two terms(declaration and definition), and the answer I was seeking was not there(which should be containing more of how exactly and when is the memory allocated to the variables). The answer provided here is different in many terms from the answers provided in that question. This is the reason I think this question is different from that one. – aambazinga Jul 29 '18 at 10:28

1 Answers1

4

A declaration states the type of a variable (which also helps determine how much space it will need).

A definition states the type of a variable, and also allocates space for the variable.

So yes, every definition is a declaration.

This is a declaration:

extern int j;

After this declaration, the compiler know that j is of type int. After this declaration, we can say something like

printf("%d\n", j);

We can say that, and the compiler will not complain (at least, it will not complain right away), because at lest for the moment, the compiler has all the information it needs.

However, if we try to compile this complete program:

#include <stdio.h>

extern int j;

int main()
{
    printf("%d\n", j);
}

we will likely get an error. On my compiler, I get the errors

Undefined symbols: "_j", referenced from: _main
ld: symbol(s) not found

The error means that although the compiler knew what j was, the final build process (ld) did not find an actual definition of j anywhere.

You said,

here in the above program variable j just declared, not defined,

No, in your program j is defined.

int *i, j is declaration or definition?

It is a definition, of two variables. It defines j of type int, and i of type int *, or pointer-to-int.

It looks like you've started working with pointers, and when you're working with pointers, you have to think about memory allocation more carefully. (You may also have to think about the distinction between static and dynamic allocation.)

The first thing to understand is that the compiler is perfectly capable of allocating memory. The compiler allocates memory for variables all the time. It's only when you start using pointers that you have to start worrying about memory allocation, and doing your own memory allocation.

As we've said, the line

int *i;

is a definition. It states the type of, and allocates space for, a variable named i, of type pointer-to-int.

But in this case, what we've allocated space for is the pointer. We have not allocated any space for what the pointer points to. In fact, this pointer points nowhere yet: it is uninitialized.

If we say

int j;
int *i = &j;

now the pointer i does point somewhere: it points to the variable j. Now, not only have we allocated memory for the pointer i, we have allocated memory for it to point to.

The other way to allocate memory for a pointer to point to is do do dynamic memory allocation, typically by calling malloc. If we say

int *i = malloc(sizeof(int));

again we have allocated memory both for i, and what it points to. We let the compiler allocate memory for i, and we called malloc to dynamically allocate memory for i to point to.

Some pictures may help. Here's the situation after the first scrap of code:

   +-----------+
j: |           |
   +-----------+
         ^
         |
   +-----|-----+
i: |     *     |
   +-----------+

Both of those boxes were allocated for us by the compiler.

Here's the situation after the second scrap of code:

   +-----------+         +-----------+
i: |     *-------------> |           |
   +-----------+         +-----------+

The box on the left, for i, is the one the compiler allocated for us, and the unnamed box on the right is the one we got from malloc.

What about a declaration? What does

extern int j;

look like? I guess I'd draw that like this:

j:

There's a symbol (a name) j, but there's no box next to it yet (although if there were a box, it would be of the right size and shape to hold an int).


In a comment you asked,

given int j; int *i=&j; we are not knowing the memory location of j, so how can we allocate its address to i?

I'm not sure what you mean by "we are not knowing the memory location of j". We may not know the address, but the compiler does. The compiler does whatever it takes to make sure that space is allocated for j, at a known location. The expression &j literally means, "fetch the known location of the variable j". And since

int j;

is a definition, it means that j will be allocated space at a known location.


In all of this I've been saying things like "the compiler allocates memory for...", but in reality it's a little more complicated than that. The actual allocation of actual memory may happen later.

For local variables, the compiler "allocates" variables on the stack. What it actually (typically) does is sets aside space in the function's stack frame. The "address" of a local variable is, in a sense, its offset from the base of the stack frame. The variable's actual address will not be known until the program is running, and the function is called, and its stack frame is created, at a particular address on the stack.

For global variables, the compiler allocates what we might call tentative definitions. C is set up for separate compilation, meaning that the compiler may be invoked multiple times, on multiple source files, creating multiple object files, which are later compiled together by a separate "linker" program into a single executable. The compiler gives each defined global variable a tentative address in the data segment of a single object file. (And in reality it may be more complicated than that, if the compiler is emitting assembly language, which is being assembled by a separate assembler. In that case the compiler emits directives which cause the assembler to perform these tentative allocations.)

When the linker combines several object files together, it has to combine their partial, tentative data segments together, too. This process involves giving new addresses to most/all of the global variables. The compiler has taken care to emit extra information, called "relocation information", describing not only the definition of each global symbol, but also each place in the code where the symbol is referenced. That way, when the linker moves global symbols into their final positions, it can go back and adjust all the references -- all the calls to global functions, and all the places where global variables have their addresses taken with &. (In other words, yes, the linker is going back and tinkering with the machine code generated by the compiler and/or the assembler.)

In this scheme, external declarations result in slightly different information in the object file. Rather than a tentative allocation (with a tentative address, which will typically be adjusted later by the linker, along with a list of references that may need to be relocated), the compiler emits its own version of an external declaration, telling the linker that it will need to find the actual definition of this symbol in some other object file. (It essentially says that the symbol has a tentative address of 0, which will always be adjusted later.) But the compiler emits the same sorts of relocation information, so that when (if) the linker finds the symbol in the other object file, it can fill in that address at all spots where it's referenced.

So if we looked at the low-level code emitted by the compiler for the code

int *i = &j;

we would find one of three things, depending on whether j was a local, global (that is, defined global), or external global variable.

  1. For a local variable, the compiler will emit code to take the base address of the stack frame (whatever that ends up being at run-time), add to it the offset of variable j (which the compiler knows), and store the resulting address into i.

  2. For a global variable defined in the same source file, the compiler will emit code to take j's tentative address and store it into i. But it will also emit a relocation record pointing at the instruction where j's address is stored into i, so that the linker can adjust the address to be j's actual, final address.

  3. For an external global variable, the compiler will emit code to take an unknown address (typically the tentative address '0') and store it into i. It will emit a relocation record pointing at the instruction where j's address is stored into i, so that the linker can fill it in with j's actual address.

(Again, in all of this I've been referring to stack frames, which C doesn't require and which C programs on some architectures don't necessarily use. But they're a decent way to think about what happens, and your machine probably uses them.)

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • every definition is a declaration. but every declaration is not a definition.. right? so, you are saying that "extern int i" when planted outside main is a declaration, but "int i" inside main is both declaration and definition? – aambazinga Jul 28 '18 at 13:04
  • but i'm not getting it, how can it be definition too? we are just saying to the compiler that look i and j are two variables one is of integer pointer type and other of integer type. but right now no memory allocation has been done. – aambazinga Jul 28 '18 at 13:06
  • *"A declaration states the type of a variable (which also determines how much space it will need)."* => no. `extern char foo[];` - how much memory does `foo` need? – Antti Haapala -- Слава Україні Jul 28 '18 at 13:13
  • in the example you have given, int j; int *i=&j; we are not knowing the memory location of j, so how can we allocate its address to i? – aambazinga Jul 28 '18 at 13:15
  • @AnttiHaapala Good point -- fixed. – Steve Summit Jul 28 '18 at 13:23
  • @aambazinga `extern int i` is a declaration, regardless of whether it's inside or outside of a function. `int i` is a definition, regardless of whether it's inside or outside of a function. (Outside of a function it defines a global variable, inside of a function it defines a local variable.) – Steve Summit Jul 28 '18 at 13:25
  • @aambazinga You keep saying things like "right now no memory allocation has been done", but I am telling you, `int j` *is* a definition and it *does* allocate memory. What do you believe memory allocation is, o=if `int j` doesn't do it? What more do you believe we have to do? – Steve Summit Jul 28 '18 at 13:27
  • i used to thought that during compilation, tokens are created, after that syntax analysis and all, finally converting the program into machine language. in between them, no memory allocation to the variables were there(apart from constants and static), this is because if we would allocate memory at compile time only, then we have to give fixed address to the variables during compilation only, which would not always be known. and if we compiler is doing memory allocation at compile time, then program has to be loaded at that particular location always for running. – aambazinga Jul 28 '18 at 13:35
  • @aambazinga: remember that on most modern operating systems the memory is *virtual memory* and an address is a *virtual address*, not a "real" one in RAM. Mapping between virtual address and real address is done by the operating system and is dynamic, i.e. it can change over the life of a process. – cdarke Jul 28 '18 at 13:44
  • To use a variable at run time, the compiler has to emit some object/executable code that causes it to exist. Depending on host system, and executable file format, the memory allocation may happen at compile time (e.g. info that causes the variable to exist is written to object/executable files) or may be resolved at run time (some code is emitted which is executed at run tune, and allocates memory for each variable).The details don't matter. Somewhere in the process between the compiler parsing your source code and your program using that variable while running, space must be allocated for it. – Peter Jul 28 '18 at 13:46
  • @aambazinga To really understand the compilation process at this level, you need to understand the way the *compiler* and the *linker* cooperate (with perhaps an *assembler* in between them). The details here can get very system-dependent, not specified by the C language. I've added some additional explanation of how things work on one typical class of implementations. – Steve Summit Jul 28 '18 at 14:07