A declaration states the type of a variable (which also helps determine how much space it will need).
A definition states the type of a variable, and also allocates space for the variable.
So yes, every definition is a declaration.
This is a declaration:
extern int j;
After this declaration, the compiler know that j
is of type int
. After this declaration, we can say something like
printf("%d\n", j);
We can say that, and the compiler will not complain (at least, it will not complain right away), because at lest for the moment, the compiler has all the information it needs.
However, if we try to compile this complete program:
#include <stdio.h>
extern int j;
int main()
{
printf("%d\n", j);
}
we will likely get an error. On my compiler, I get the errors
Undefined symbols: "_j", referenced from: _main
ld: symbol(s) not found
The error means that although the compiler knew what j
was, the final build process (ld
) did not find an actual definition of j
anywhere.
You said,
here in the above program variable j
just declared, not defined,
No, in your program j
is defined.
int *i, j
is declaration or definition?
It is a definition, of two variables. It defines j
of type int
, and i
of type int *
, or pointer-to-int
.
It looks like you've started working with pointers, and when you're working with pointers, you have to think about memory allocation more carefully. (You may also have to think about the distinction between static and dynamic allocation.)
The first thing to understand is that the compiler is perfectly capable of allocating memory. The compiler allocates memory for variables all the time. It's only when you start using pointers that you have to start worrying about memory allocation, and doing your own memory allocation.
As we've said, the line
int *i;
is a definition. It states the type of, and allocates space for, a variable named i
, of type pointer-to-int
.
But in this case, what we've allocated space for is the pointer. We have not allocated any space for what the pointer points to. In fact, this pointer points nowhere yet: it is uninitialized.
If we say
int j;
int *i = &j;
now the pointer i
does point somewhere: it points to the variable j
. Now, not only have we allocated memory for the pointer i
, we have allocated memory for it to point to.
The other way to allocate memory for a pointer to point to is do do dynamic memory allocation, typically by calling malloc
. If we say
int *i = malloc(sizeof(int));
again we have allocated memory both for i
, and what it points to. We let the compiler allocate memory for i
, and we called malloc
to dynamically allocate memory for i
to point to.
Some pictures may help. Here's the situation after the first scrap of code:
+-----------+
j: | |
+-----------+
^
|
+-----|-----+
i: | * |
+-----------+
Both of those boxes were allocated for us by the compiler.
Here's the situation after the second scrap of code:
+-----------+ +-----------+
i: | *-------------> | |
+-----------+ +-----------+
The box on the left, for i
, is the one the compiler allocated for us, and the unnamed box on the right is the one we got from malloc
.
What about a declaration? What does
extern int j;
look like? I guess I'd draw that like this:
j:
There's a symbol (a name) j
, but there's no box next to it yet (although if there were a box, it would be of the right size and shape to hold an int
).
In a comment you asked,
given int j; int *i=&j;
we are not knowing the memory location of j
, so how can we allocate its address to i
?
I'm not sure what you mean by "we are not knowing the memory location of j
". We may not know the address, but the compiler does. The compiler does whatever it takes to make sure that space is allocated for j
, at a known location. The expression &j
literally means, "fetch the known location of the variable j
". And since
int j;
is a definition, it means that j
will be allocated space at a known location.
In all of this I've been saying things like "the compiler allocates memory for...", but in reality it's a little more complicated than that. The actual allocation of actual memory may happen later.
For local variables, the compiler "allocates" variables on the stack. What it actually (typically) does is sets aside space in the function's stack frame. The "address" of a local variable is, in a sense, its offset from the base of the stack frame. The variable's actual address will not be known until the program is running, and the function is called, and its stack frame is created, at a particular address on the stack.
For global variables, the compiler allocates what we might call tentative definitions. C is set up for separate compilation, meaning that the compiler may be invoked multiple times, on multiple source files, creating multiple object files, which are later compiled together by a separate "linker" program into a single executable. The compiler gives each defined global variable a tentative address in the data segment of a single object file. (And in reality it may be more complicated than that, if the compiler is emitting assembly language, which is being assembled by a separate assembler. In that case the compiler emits directives which cause the assembler to perform these tentative allocations.)
When the linker combines several object files together, it has to combine their partial, tentative data segments together, too. This process involves giving new addresses to most/all of the global variables. The compiler has taken care to emit extra information, called "relocation information", describing not only the definition of each global symbol, but also each place in the code where the symbol is referenced. That way, when the linker moves global symbols into their final positions, it can go back and adjust all the references -- all the calls to global functions, and all the places where global variables have their addresses taken with &
. (In other words, yes, the linker is going back and tinkering with the machine code generated by the compiler and/or the assembler.)
In this scheme, external declarations result in slightly different information in the object file. Rather than a tentative allocation (with a tentative address, which will typically be adjusted later by the linker, along with a list of references that may need to be relocated), the compiler emits its own version of an external declaration, telling the linker that it will need to find the actual definition of this symbol in some other object file. (It essentially says that the symbol has a tentative address of 0, which will always be adjusted later.) But the compiler emits the same sorts of relocation information, so that when (if) the linker finds the symbol in the other object file, it can fill in that address at all spots where it's referenced.
So if we looked at the low-level code emitted by the compiler for the code
int *i = &j;
we would find one of three things, depending on whether j
was a local, global (that is, defined global), or external global variable.
For a local variable, the compiler will emit code to take the base address of the stack frame (whatever that ends up being at run-time), add to it the offset of variable j
(which the compiler knows), and store the resulting address into i
.
For a global variable defined in the same source file, the compiler will emit code to take j
's tentative address and store it into i
. But it will also emit a relocation record pointing at the instruction where j
's address is stored into i
, so that the linker can adjust the address to be j
's actual, final address.
For an external global variable, the compiler will emit code to take an unknown address (typically the tentative address '0') and store it into i
. It will emit a relocation record pointing at the instruction where j
's address is stored into i
, so that the linker can fill it in with j
's actual address.
(Again, in all of this I've been referring to stack frames, which C doesn't require and which C programs on some architectures don't necessarily use. But they're a decent way to think about what happens, and your machine probably uses them.)