Pointers serve 3 main purposes in C:
- Fake pass-by-reference semantics;
- Track dynamically-allocated memory;
- Build dynamic data structures.
Fake pass-by-reference semantics: in C, all function arguments are passed by value. Given the following snippet:
void foo( int a, int b )
{
a = 1;
b = 2;
}
void bar( void )
{
int x=0, y=1;
foo( x, y );
printf( "x = %d, y = %d\n", x, y );
}
The formal parameters a
and b
in foo
are different objects in memory from the actual parameters x
and y
in bar
, so any changes to a
and b
are not reflected in x
and y
. The output will be "x = 0, y = 1". If you want foo
to alter the values of x
and y
, you will need to pass pointers to those variables instead:
void foo( int *a, int *b )
{
*a = 1;
*b = 2;
}
void bar( void )
{
int x = 0, y = 1;
foo( &x, &y );
printf( "x = %d, y = %d\n", x, y );
}
This time, the formal parameters a
and b
are pointers to the variables x
and y
; writing to the expressions *a
and *b
int foo
is equivalent to writing to x
and y
in bar
. Thus, the output is "x = 1, y = 2".
This is how scanf()
and scores of other library functions work; they use pointers to reference the actual memory we want to operate on.
Track dynamically allocated memory: The library functions malloc
, calloc
, and realloc
allow us to allocate memory at runtime, and all three return pointers to the allocated memory (as of C89, all three return void *
). For example, if we want to allocate an array of int
at run time:
int *p = NULL;
size_t numItems;
// get numItems;
p = malloc( sizeof *p * numItems );
if ( p )
{
// do stuff with p[0] through p[numItems - 1];
}
free( p );
The pointer variable p
will contain the address of the newly allocated block of memory large enough to hold numItems
integers. We can access that memory by dereferencing p
using either the *
operator or the []
subscript operator (*(p+i) == p[i]
).
So why not just declare an array of size numItems
and be done with it? After all, as of C99, you can use a variable-length array, where the size doesn't have to be known until runtime:
// get numItems
int p[numItems];
Three reasons: first, VLA's are not universally supported, and as of the 2011 standard, VLA support is now optional; second, we cannot change the size of the array after it has been declared, whereas we can use realloc
to resize the memory block we've allocated; and finally, VLAs are limited both in where they can be used and how large they can be - if you need to allocate a lot of memory at runtime, it's better to do it through malloc/calloc/realloc
than VLAs.
A quick note on pointer arithmetic: for any pointer T *p
, the expression p+1
will evaluate to the address of the next element of type T
, which is not necessariy the address value + 1. For example:
T sizeof T Original value of p p + 1
- -------- ------------------- -----
char 1 0x8000 0x8001
int 4 0x8000 0x8004
double 8 0x8000 0x8008
Build dynamic data structures: There are times when we want to store data in such a way that makes it easy to insert new elements into a list, or quickly search for a value, or force a specific order of access. There are a number of different data structures used for these purposes, and in almost all cases they use pointers. For example, we can use a binary search tree to organize our data in such a way that searching for a particular value is pretty fast. Each node in the tree has two children, each of which points to the next element in the tree:
struct node {
T key;
Q data;
struct node *left;
struct node *right;
};
The left
and right
members point to other nodes in the tree, or NULL if there is no child. Typically, the left
child points to a node whose value is somehow "less than" the value of the current node, while the right
child points to a node whose value is somehow "greater than" the current node. We can search the tree for a value like so:
int find( struct node *root, T key, Q *data )
{
int result = 0;
if ( root == NULL ) // we've reached the bottom of the tree
{ // without finding anything
result = 0;
}
else if ( root->key == key ) // we've found the element we're looking for
{
*data = root->data;
result = 1;
}
else if ( root->key < key )
{
// The input key is less than the current node's key,
// so we search the left subtree
result = find( root->left, key, data );
}
else
{
// The input key is greater than the current node's key,
// so we search the right subtree
result = find( root->right, key, data );
}
return result;
}
Assuming the tree is balanced (that is, the number of elements in the left subtree is equal to the number of elements in the right subtree), then the number of elements checked is around log2 N, where N is the total number of elements in the tree.