So, quick crash course on pointers in C.
First of all, why do we use pointers in C? Basically, we have to use pointers for the following reasons:
- To allow functions to modify their parameters
- To track dynamically allocated memory
Pointers come in handy in other ways, because they offer a form of indirection. They allow us to manipulate other objects without having to know the other objects' names.
Indirection is a powerful tool in programming, one that you've already seen if you've dealt with arrays. Instead of creating 100 unique integer variables, we can create an array of integers, and use subscripting to refer to a specific object. That is, instead of writing
int var0 = 0;
int var1 = 1;
int var2 = 2;
...
int var99 = 99;
we can write
int var[100];
for ( int i = 0; i < 100; i++ )
var[i] = i;
Array subscript notation allows us to refer to an object indirectly, rather than by a unique name. It provides a shortcut for managing large numbers of objects by referring to them with a single expression. Pointers serve much the same purpose. Suppose we have several integer variables named x
, y
, and z
. We can create a pointer p
to refer to each one in turn:
int x = 10;
int y = 20;
int z = 30;
int *p;
Let's start by playing with x
. We set p
to point to x
using the unary &
address-of operator:
p = &x; // int * = int *
The type of the variable p
is int *
. The type of the expression &x
is int *
, and the value of the expression is the address of x
. We can then change the value of x
through p
using the unary *
indirection operator:
*p = 15; // int = int
Since the type of the variable p
is int *
, the type of the expression *p
is int
, and that expression designates the same object that the expression x
does. So in the line above, we're changing the value stored in x
indirectly through p
. We can do the same thing with y
and z
:
p = &y;
*p = 25;
p = &z;
*p = 35;
Okay, cool, but why not just assign to x
, y
, and z
directly? Why go through the pain of assigning their addresses to p
and assigning values through *p
?
Normally we wouldn't do it that way, but there's a case where it can't be avoided. Suppose we want to write a function that modifies the value of one or more of its parameters, like so:
void foo( int x )
{
x = 2 * x;
}
and call it like this:
int main( void )
{
int val = 2;
printf( "before foo: %d\n", val );
foo( val );
printf( "after foo: %d\n", val );
return 0;
}
What we want to see is
before foo: 2
after foo: 4
but what we get is
before foo: 2
after foo: 2
It doesn't work because C uses a parameter-passing convention called "pass-by-value" - in short, the formal parameter x
in the function definition designates a separate object in memory than the actual parameter val
. Writing a new value to x
doesn't affect val
. In order for foo
to modify the actual parameter val
, we must pass a pointer to val
:
void foo( int *x ) // x == &val
{ // *x == val
*x = *x * 2;
}
int main( void )
{
int val = 2;
printf( "before foo: %d\n", val );
foo( &val );
printf( "after foo: %d\n", val );
return 0;
}
Now we get the output we expect - val
is modified by foo
. The expression *x
refers to the same object that val
does. And now we can write something like
foo( &y ); // x == &y, *x == y
foo( &z ); // x == &z, *x == z
This is our first use case - allowing a function to modify its parameters.
There are times during a program's execution where you need to allocate some extra memory. Since this allocation occurs at runtime, there's no way to specify a name for this extra memory the same way you do for a regular variable. IOW, there's no way to write
int x = new_memory();
because variable names don't exist at runtime (they're not preserved in the generated machine code). Again, we must refer to this dynamically allocated memory indirectly through a pointer:
int *p = malloc( sizeof *p ); sizeof *p == sizeof (int)
This allocates enough space for a single int
object, and assigns the address of that new space to p
. You can allocate blocks of arbitrary size:
int *arr = malloc( sizeof *arr * 100 );
allocates enough space for 100 int
objects, and sets arr
to point to the first of them.
This is our second use case - tracking dynamically allocated memory.
A quick note on pointer syntax. There are two operators associated with pointer operations. The unary
&
operator is used to obtain the address of an object, while the unary
*
dereferences a pointer. Assume we have an
int
object named
x
and a pointer to an
int
named
p
:
p = &x; // assign the address of x to p
*p = 10; // assign a new value to whatever p points to
In a declaration, the unary *
indicates that the thing being declared has pointer type:
int *p; // p has type "pointer to int"
When you initialize a pointer in a declaration like
int *p = &x;
p
is not being dereferenced. That is the same as writing
int *p;
p = &x;
The *
operator binds to the thing being declared, not to the type specifier. You can write that same declaration as
int* p;
int * p;
int*p;
and it will always be parsed as int (*p);
.
For any type T
, the following are true:
T *p; // p has type "pointer to T"
T *p[N]; // p has type "array of pointers to T"
T (*p)[N]; // p has type "pointer to array of T"
T *p(); // p has type "function returning pointer to T"
T (*p)(); // p has type "pointer to function returning T"
Complex pointer declarations can get hard to read, since the unary *
is a prefix operator and has lower precedence than the []
and ()
operators. For example:
T *(*(*foo)())[N];
foo
is a pointer to a function returning a pointer to an N-element array of pointers to T
.
With your code, we're dealing with the first case - we want the readstudent
function to modify the contents of an existing instance of struct student
. And readstudent
does that by calling scanf
to read values into each separate member:
scanf("%s", pstu->name);
scanf("%f", &pstu->grade1);
scanf("%f", &pstu->grade2);
Remember that scanf
expects its arguments to be pointers - again, we're trying to modify the contents of an object, so we have to pass a pointer to that object as the parameter.
&pstu->grade1
evaluates to the address of the grade1
member of the object that pstu
points to. &pstu->grade2
evaluates to the address of the grade2
member of the object that pstu
points to.
So what the heck is going on with pstu->name
?
Arrays are special in C. Unless it is the operand of the sizeof
or unary &
operators, or is a string literal used to initialize an array in a character declaration like
char foo[] = "test";
an expression of type "N-element array of T
" will be converted ("decay") to an expression of type "pointer to T
" and the value of the expression will be the address of the first element of the array.
We won't go into the weeds on this, but this was a deliberate design decision by Ritchie when he was first creating the C language, and it does serve a purpose. It also means that arrays lose their "array-ness" under most circumstances, and what you wind up dealing with is actually a pointer to the first element, not the whole array itself. In the case of the scanf
call, we're passing the equivalent of &pstu->name[0]
.