9

As part of answering another question, I came across a piece of code like this, which gcc compiles without complaint.

typedef struct {
    struct xyz *z;
} xyz;
int main (void) {
    return 0;
}

This is the means I've always used to construct types that point to themselves (e.g., linked lists) but I've always thought you had to name the struct so you could use self-reference. In other words, you couldn't use xyz *z within the structure because the typedef is not yet complete at that point.

But this particular sample does not name the structure and it still compiles. I thought originally there was some black magic going on in the compiler that automatically translated the above code because the structure and typedef names were the same.

But this little beauty works as well:

typedef struct {
    struct NOTHING_LIKE_xyz *z;
} xyz;

What am I missing here? This seems a clear violation since there is no struct NOTHING_LIKE_xyz type defined anywhere.

When I change it from a pointer to an actual type, I get the expected error:

typedef struct {
    struct NOTHING_LIKE_xyz z;
} xyz;

qqq.c:2: error: field `z' has incomplete type

Also, when I remove the struct, I get an error (parse error before "NOTHING ...).

Is this allowed in ISO C?


Update: A struct NOSUCHTYPE *variable; also compiles so it's not just inside structures where it seems to be valid. I can't find anything in the c99 standard that allows this leniency for structure pointers.

caf
  • 233,326
  • 40
  • 323
  • 462
paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953

7 Answers7

8

As the warning says in the second case, struct NOTHING_LIKE_xyz is an incomplete type, like void or arrays of unknown size. An incomplete type can only appear in a struct as a type pointed to (C17 6.7.2.1:3), with an exception for arrays of unknown size that are allowed as the last member of a struct, making the struct itself an incomplete type in this case. The code that follows cannot dereference any pointer to an incomplete type (for good reason).

Incomplete types can offer some datatype encapsulation of sorts in C... The corresponding paragraph in http://www.ibm.com/developerworks/library/pa-ctypes1/ seems like a good explanation.

Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
  • 1
    Since this is a question marked with *language lawyer*, could you give proof about "*An incomplete type can only appear as a type pointed to*" with regard to defining a pointer to an incomplete structure type in another structure? – RobertS supports Monica Cellio Jul 11 '20 at 08:36
  • 1
    @RobertSsupportsMonicaCellio reference to https://cigix.me/c17#6.7.2.1.p3 added in the answer. – Pascal Cuoq Jul 11 '20 at 11:14
7

The parts of the C99 standard you are after are 6.7.2.3, paragraph 7:

If a type specifier of the form struct-or-union identifier occurs other than as part of one of the above forms, and no other declaration of the identifier as a tag is visible, then it declares an incomplete structure or union type, and declares the identifier as the tag of that type.

...and 6.2.5 paragraph 22:

A structure or union type of unknown content (as described in 6.7.2.3) is an incomplete type. It is completed, for all declarations of that type, by declaring the same structure or union tag with its defining content later in the same scope.

caf
  • 233,326
  • 40
  • 323
  • 462
  • That's what I wanted to see, being an anal-retentive language lawyer :-) Although it's para8 in my copy (but I've got the one updated to TC3 so that may explain that). – paxdiablo May 24 '10 at 07:19
  • This answers gives proof about how the declaration of a structure (in general) without prior definition declares a structure of incomplete type. But it doesn't answer if a pointer to an incomplete structure type is legal as member of another structure. – RobertS supports Monica Cellio Jul 11 '20 at 08:11
  • @RobertSsupportsMonicaCellio pointers are allowed as struct members and there's no text in the standard to rule out pointer to incomplete structure type – M.M Jul 11 '20 at 10:47
  • @M.M It is, as shown in Pascal Cuoq's answer below. – RobertS supports Monica Cellio Jul 11 '20 at 14:50
  • @RobertSsupportsMonicaCellio Pascal's answer explicitly says the pointer to incomplete type IS allowed in a struct – M.M Jul 11 '20 at 22:36
2

The 1st and 2nd cases are well-defined, because the size and alignment of a pointer is known. The C compiler only needs the size and alignment info to define a struct.

The 3rd case is invalid because the size of that actual struct is unknown.

But beware that for the 1st case to be logical, you need to give a name to the struct:

//             vvv
typedef struct xyz {
    struct xyz *z;
} xyz;

otherwise the outer struct and the *z will be considered two different structs.


The 2nd case has a popular use case known as "opaque pointer" (pimpl). For example, you could define a wrapper struct as

 typedef struct {
    struct X_impl* impl;
 } X;
 // usually just: typedef struct X_impl* X;
 int baz(X x);

in the header, and then in one of the .c,

 #include "header.h"
 struct X_impl {
    int foo;
    int bar[123];
    ...
 };
 int baz(X x) {
    return x.impl->foo;
 }

the advantage is out of that .c, you cannot mess with the internals of the object. It is a kind of encapsulation.

kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
  • Agreed, your explanation helped me understand the encapsulation advantage. That wikipedia article really drove the concept home! – Jon-Erik Jan 26 '12 at 18:40
  • Since this is a *language-lawyer* question, could you give proof to "*The 1st and 2nd cases are well-defined, because the size and alignment of a pointer is known.*" – RobertS supports Monica Cellio Jul 11 '20 at 08:05
1

You do have to name it. In this:

typedef struct {
    struct xyz *z;
} xyz;

will not be able to point to itself as z refers to some complete other type, not to the unnamed struct you just defined. Try this:

int main()
{
    xyz me1;
    xyz me2;
    me1.z = &me2;   // this will not compile
}

You'll get an error about incompatible types.

R Samuel Klatchko
  • 74,869
  • 16
  • 134
  • 187
  • I get a warning out of gcc (c rather than c++) but +1 for pointing out the fact they're actually _different_ types. – paxdiablo May 24 '10 at 07:24
1

Well... All I can say is that your previous assumption was incorrect. Every time you use a struct X construct (by itself, or as a part of larger declaration), it is interpreted as a declaration of a struct type with a struct tag X. It could be a re-declaration of a previously declared struct type. Or, it can be a very first declaration of a new struct type. The new tag is declared in scope in which it appears. In your specific example it happens to be a file scope (since C language has no "class scope", as it would be in C++).

The more interesting example of this behavior is when the declaration appears in function prototype:

void foo(struct X *p); // assuming `struct X` has not been declared before

In this case the new struct X declaration has function-prototype scope, which ends at the end of the prototype. If you declare a file-scope struct X later

struct X;

and try to pass a pointer of struct X type to the above function, the compiler will give you a diagnostics about non-matching pointer type

struct X *p = 0;
foo(p); // different pointer types for argument and parameter

This also immediately means that in the following declarations

void foo(struct X *p);
void bar(struct X *p);
void baz(struct X *p);

each struct X declaration is a declaration of a different type, each local to its own function prototype scope.

But if you pre-declare struct X as in

struct X;
void foo(struct X *p);
void bar(struct X *p);
void baz(struct X *p);

all struct X references in all function prototype will refer to the same previosly declared struct X type.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
0

I was wondering about this too. Turns out that the struct NOTHING_LIKE_xyz * z is forward declaring struct NOTHING_LIKE_xyz. As a convoluted example,

typedef struct {
    struct foo * bar;
    int j;
} foo;

struct foo {
    int i;
};

void foobar(foo * f)
{
    f->bar->i;
    f->bar->j;
}

Here f->bar refers to the type struct foo, not typedef struct { ... } foo. The first line will compile fine, but the second will give an error. Not much use for a linked list implementation then.

Scott Wales
  • 11,336
  • 5
  • 33
  • 30
  • It may be forward declaring, or struct foo may not be defined at all within the compilation unit, in which case it is an incomplete type. – Pascal Cuoq May 24 '10 at 07:05
0

When a variable or field of a structure type is declared, the compiler has to allocate enough bytes to hold that structure. Since the structure may require one byte, or it may require thousands, there's no way for the compiler to know how much space it needs to allocate. Some languages use multi-pass compilers which would be able find out the size of the structure on one pass and allocate the space for it on a later pass; since C was designed to allow for single-pass compilation, however, that isn't possible. Thus, C forbids the declaration of variables or fields of incomplete structure types.

On the other hand, when a variable or field of a pointer-to-structure type is declared, the compiler has to allocate enough bytes to hold a pointer to the structure. Regardless of whether the structure takes one byte or a million, the pointer will always require the same amount of space. Effectively, the compiler can tread the pointer to the incomplete type as a void* until it gets more information about its type, and then treat it as a pointer to the appropriate type once it finds out more about it. The incomplete-type pointer isn't quite analogous to void*, in that one can do things with void* that one can't do with incomplete types (e.g. if p1 is a pointer to struct s1, and p2 is a pointer to struct s2, one cannot assign p1 to p2) but one can't do anything with a pointer to an incomplete type that one could not do to void*. Basically, from the compiler's perspective, a pointer to an incomplete type is a pointer-sized blob of bytes. It can be copied to or from other similar pointer-sized blobs of bytes, but that's it. the compiler can generate code to do that without having to know what anything else is going to do with the pointer-sized blobs of bytes.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • "*Regardless of whether the structure takes one byte or a million, the pointer will always require the same amount of space.*" - Could you give proof to that by quotes from the standard since this question is tagged *language lawyer*? – RobertS supports Monica Cellio Jul 11 '20 at 08:41
  • " Effectively, the compiler can tread the pointer to the incomplete type as a void* until it gets more information about its type" - this is not correct. `void *` might have a different size than `struct foo *` . The compiler can cope with the pointer to incomplete struct type because all such pointers must have the same size (which is not necessarily the same as `void *`). – M.M Jul 11 '20 at 10:47
  • @M.M: True, there are some obscure architectures where byte-addressable pointers are larger than other pointers, but how likely is anyone to encounter such things? IMHO, the language would be much better if the Standard were to recognize a few more categories of implementation, like "implementation designed for low-level programming on commonplace hardware", "special-puorpose with aggressive optimizers [like those of clang and gcc] which are designed for tasks not requiring low-level semantics", etc. – supercat Jul 11 '20 at 13:09
  • @M.M: If the Standard were to define various categories of implementation, and means by which code could refuse to compile on inappropriate implementations, then there would be no need for arguments between people who think low-level programs for commonplace features should be able to do X, and those who think that compilers specialized for purposes like high-end number-crunching or those targeting obscure platforms shouldn't have to do X. – supercat Jul 11 '20 at 16:26
  • @M.M: If a program were to include an `#if` or other means to say it should only run on implementations where any pointer may be meaningfully accessed by dereferencing a `void**`, then compiler writers that don't want to support such constructs would be free to reject the program, but the behavior would be defined *on all compilers that don't reject it outright*. – supercat Jul 11 '20 at 16:29