There are actually four name-spaces in C (although this depends on a particular way of counting, and some include macro names as a fifth space, which I think is a valid way to think about them):
goto
labels
- tags (
struct
, union
, and enum
)
- the actual members of a struct or union type (one per type, hence you could count this as "many" instead of "one" name space)
- all other ("ordinary") identifiers, such as function and variable names and the names made to be synonyms for other types via
typedef
.
While it should (in theory) be possible to have separate spaces for struct
vs union
, for instance, C does not, so:
struct foo; union foo; /* ERROR */
is invalid. Yet:
struct foo { int a, b; };
struct bar { char b; double a; };
is just fine, showing that the members of the two different struct
types are in different name-spaces (so again this makes the count of "4 name-spaces" above suspect :-) ).
All that aside, C has some moderately (and in some ways unnecessarily) complicated, but quite workable in practice, rules for how struct types work.
Each struct
creates a new type unless it refers back to an existing type. The struct
keyword may be followed by an identifier, or just an open brace {
. If there is just an open brace, the struct
creates a new type:
struct { ... } X; /* variable X has a unique type */
If there is an identifier, the compiler must look at the (single) tag name-space to see if that name is already defined. If not, the struct
defines a new type:
struct blart { ... } X; /* variable X has type <struct newname>, a new type */
If the identifier is already present, generally this refers back to the existing type:
struct blart Y; /* variable Y has the same type as variable X */
There is one special exception, though. If you're in a new scope (such as at the beginning of a function), a "vacuous declaration"—the struct
keyword, followed by an identifier, followed by a semicolon—"clears out" the previous visible type:
void func(void) {
struct blart; /* get rid of any existing "struct blart" */
struct blart { char *a; int b; } v;
Here v
has a new type, even if struct blart
was already defined outside func
.
(This "vacuous declaration" trick is mostly useful in obfuscated code contests. :-) )
If you're not at a new scope, a vacuous declaration serves the purpose of declaring that the type exists. This is mainly useful to work around a different issue, which I will cover in a moment.
struct blart;
Here struct blart
alerts you (and the compiler) that there is now a type named "struct blart". This type is merely declared, meaning that the struct type is "incomplete", if struct blart
has not yet been defined. This type is defined (and "complete") if struct blart
has been defined. So:
struct blart { double blartness; };
defines it, and then any earlier or later struct blart
s refer to the same type.
Here's why this sort of declaration is useful. In C, any declaration of an identifier has scope. There are four possible scopes: "file", "block", "prototype", and "function". The last one (function scope) is exclusively for goto
labels, so we can ignore it from here on. That leaves file, block, and prototype scopes. File scope is a technical term for what most people think of as "global", in contrast with "block scope" which is "local":
struct blart { double blartness } X; /* file scope */
void func(void) {
struct slart { int i; } v; /* block scope */
...
}
Here struct blart
has file scope (as does "global" variable X
), and struct slart
has block scope (as does "local" variable v
).
When the block ends, struct slart
goes away. You can no longer refer to it by name; a later struct slart
creates a new and different type, in exactly the same way that a later int v;
creates a new v
, and does not refer to the v
within the block scope inside function func
.
Alas, the committee that designed the original C standard included (for good reason) one more scope, inside the function prototype, in a way that interacts rather badly with these rules. If you write a function prototype:
void proto(char *name, int value);
the identifiers (name
and value
) disappear after the closing parenthesis, just as you'd expect—you wouldn't want this to create a block-scope variable called name
. Unfortunately, the same happens with struct
:
void proto2(struct ziggy *stardust);
The name stardust
goes away, but so does struct ziggy
. If struct ziggy
did not appear earlier, that new, incomplete type that is created inside the prototype, has now been removed from all human reach. It can never be completed. Good C compilers print a warning here.
The solution is to declare the struct—whether complete or not [*]—before writing the prototype:
struct ziggy; /* hey compiler: "struct ziggy" has file scope */
void proto2(struct ziggy *stardust);
This time, struct ziggy
has an already-existing, visible declaration to refer back to, so it uses the existing type.
[* In header files, for instance, you often don't know if the header that defines the struct
has been included, but you can declare the struct yourself, and then define protoypes that use pointers to it.]
Now, as to typedef
...
The typedef
keyword is syntactically a storage-class specifier, like register
and auto
, but it acts quite weird. It sets a flag in the compiler that says: "change variable declarations into type-name aliases".
If you write:
typedef int TX, TY[3], *TZ;
the way that you (and the compiler) can understand this is to start by removing the typedef
keyword. The result needs to be syntactically valid, and it is:
int TX, TY[3], *TZ;
This would declare three variables:
TX
has type int
TY
has type "array 3 of int
"
TZ
has type "pointer to int
"
Now you (and the compiler) put the typedef
back in, and change "has" to "is another name for":
TX
is another name for type int
TY
is another name for "array 3 of int
"
TZ
is another name for "pointer to int
"
The typedef
keyword works with struct
types in exactly the same way. It's the struct
keyword that creates the new type; then typedef
changes the variable declaration(s) from "has type ..." to "is another name for type ...". So:
typedef struct ca ca_t;
starts by either creating new type, or referring back to existing type, struct ca
as usual. Then, instead of declaring a variable ca_t
as having type struct ca
, it declares the name as another name for the type struct ca
.
If you omit the struct tag name, you are left with only two valid syntactic patterns:
typedef struct; /* note: this is pointless */
or:
typedef struct { char *top_coat; int top_hat; } zz_t, *zz_p_t;
Here, struct {
creates a new type (remember, we said this way back at the beginning!), and then after the closing }
, the identifiers that would have declared variables, now make type-aliases. Again, the type was actually created by the struct
keyword (although it hardly matters this time; the typedef-names are now the only ways to refer to the type).
(The reason the first pointless pattern is the way it is, is that without the braces, the first identifier you stick in is the struct-tag:
typedef struct tag; /* (still pointless) */
and thus you haven't omitted the tag after all!)
As for the last question, about the syntax error, the problem here is that C is designed as a "single pass" language, where you (and the compiler) never have to look very far forward to find out what something is. When you attempt something like this:
typedef struct list {
...
List *next; /* ERROR */
} List;
you've given the compiler too much to digest at once. It starts by (in effect) ignoring the typedef
keyword except to set the flag that changes the way variables will be declared. This leaves you with:
struct list {
...
List *next; /* ERROR */
}
The name List
is simply not yet available. The attempt to use List *next;
does not work. Eventually the compiler would reach the "variable declaration" (and because the flag is set, change it to a type-alias instead), but it's too late by then; the error has already occurred.
The solution is the same as with function prototypes: you need a "forward declaration". The forward declaration will give you an incomplete type, until you finish defining the struct list
part, but that's OK: C lets you use incomplete types in a number of positions, including when you want to declare a pointer, and including with typedef
alias-creation. So:
typedef struct list List; /* incomplete type "struct list" */
struct list { /* begin completing "struct list" */
...
List *next; /* use incomplete "struct list", through the type-alias */
}; /* this "}" completes the type "struct list" */
This gains relatively little over just writing struct list
everywhere (it saves a bit of typing, but so what? well, OK, some of us suffer a bit of carpal tunnel / RSI issues :-) ).
[Note: this last segment is going to cause controversy... it always does.]
In fact, if you mentally replace struct
with type
, C code becomes a whole lot nicer to "strongly typed language" fans. Instead of the terrible [%], weak-sauce:
typedef int distance; /* distance is measured in discrete units */
typedef double temperature; /* temperatures are fractional */
they can write:
#define TYPE struct
TYPE distance;
TYPE temperature;
These, being incomplete types, are truly opaque. To create or destroy or indeed do anything with a distance value you must call a function (and—for most variables anyway; there are some exceptions for external identifiers—use pointers, alas):
TYPE distance *x = new_distance(initial_value);
increase_distance(x, increment);
use_distance(x);
destroy_distance(x);
Nobody can write:
*x += 14; /* 3 inches in a dram, 14 ounces in a foot */
It simply won't compile.
Those who are a bit less bondage-and-discipline with their type systems can relax the constraints by completing the type:
TYPE distance { int v; };
TYPE temperature { double v; };
Of course, now "cheaters" can do:
TYPE distance x = { 0 };
x.v += 14; /* 735.5 watts in a horsepower */
(well, at least that last comment is correct).
[% Not really that terrible, I think. Some seem to disagree.]