8

I'm stuck on understanding what's happening with this struct (from C for Programmers a Deitel book).

The book says, "A structure cannot contain an instance of itself. For example, a variable of type struct employee cannot be declared in the definition for struct employee. A pointer to struct employee, however, may be included."

It then gives the following example:

struct employee2 {
   char firstName[ 20 ];

   char lastName[ 20 ];

   unsigned int age;

   struct employee2 *ePtr;
};

I don't understand what this is doing, and I don't understand the reference to struct employee without the 2.

How does struct employee2 *ePtr know about struct employee or am I off basis here?

Jean-Bernard Pellerin
  • 12,556
  • 10
  • 57
  • 79
nullByteMe
  • 6,141
  • 13
  • 62
  • 99
  • Maybe getting a structs or struct elements address is slower than accesing an element which already has the address. – huseyin tugrul buyukisik May 24 '13 at 18:44
  • 6
    There is no `struct employee` in your cited code; only `struct employee2`. – jamesdlin May 24 '13 at 18:44
  • @jamesdlin the book doesn't reference code to `struct employee`. That's why I didn't include it. – nullByteMe May 24 '13 at 18:50
  • This is a duplicate but my answer is distinct from any on there and OP expressed it was helpful - so questions should be merged not closed. http://stackoverflow.com/questions/588623/self-referential-struct-definition – djechlin May 24 '13 at 18:54
  • 1
    @nkon: You asked "how does `struct employee2 *ePtr` know about `struct employee`". It doesn't know and doesn't need to know; `struct employee` is not involved in any way. `struct employee2` is. – jamesdlin May 24 '13 at 21:26

4 Answers4

13

A more meaningful example might be

struct employee2* manager;

Note that to remove the * means the C compiler must lay out the 44 (or so) bytes needed for the top-level employee, then another 44 bytes for the next inner employee, then the 44 for the next next inner employee, then 44 for the next next next inner employee... and so forth. Needless to say this is a compile error.

Furthermore this impossible structure would force them to all be distinct employees, and would require that when you create any employee you create all the managers, which have to be not null, and distinct. This means you can't have a CEO, whereas with a pointer the CEO's manager could be NULL or herself, depending on your implementation. It also makes it rather impossible to change managers without deleting a record from the employee system (i.e. firing an employee) and recreating it (hiring), which also requires revoking building access, computer access, and so forth. My point in saying this is that to not have a pointer is a really, really bad way to model what's going on in the real world.

However the C compiler can lay out 44 bytes for the employee then 4 for the address of the employee's manager, which will in turn point to 44+4 bytes, if it's not null. Note these aren't necessarily distinct bytes - perhaps an employee is her own manager (your business logic should prohibit this but hey, what does C care).

A lower-level example is a linked list, which goes something more like this:

typedef struct {
    int data;
    node* next;
} node;

But again, same idea. Unless you have all infinity distinct nodes ready to create at once, this won't work. Linked lists will end in a NULL value, which is possible to express with a pointer, but not a structure that can't be null since it needs to take memory.

Anyway pointers are ways for one structure to refer to another structure without physically laying out the memory again. C is a low level language but if you learn to think from the point of view of the compiler some higher level concepts will make sense. For instance, to delete the * would also mean that an employee "owns" her manager. This doesn't make sense from a real-world perspective and doesn't make sense from a memory management perspective. (Although, there are senses in which parents can own children... this isn't a perfect analogy here.)

djechlin
  • 59,258
  • 35
  • 162
  • 290
5

The magic here becomes clear when you understand the C concept of an incomplete type. A struct can only contain completed types, i.e. those that the compiler knows the size of. When the compiler sees

struct foo {

it knows that there will be a struct with the tag foo; this type (struct foo) is incomplete at this very moment. It becomes complete not until the matching } is seen.

However, and this is the magic, a pointer to an incomplete type is a complete type, because the size of any pointer is known--no matter what type it points to. So after the three tokens above, it is okay to declare a struct member

  struct foo *ptr_to_struct_foo;

inside a struct foo and before the matching }.

Another often used incomplete type is void--this type cannot even be completed, which is why C doesn't allow you to declare

void foo;

However, it is perfectly fine to declare pointers to such a type:

void *foo;

But indirecting through a ptr-to-void of course is not allowed. Why? Now you know the answer: because that would yield an incomplete type.

Jens
  • 69,818
  • 15
  • 125
  • 179
  • So I can technically do this: `struct foo {` and then create a pointer to this and the compiler won't whine? – nullByteMe May 24 '13 at 18:54
  • No, you must provide a syntactically valid declaration with at least one member and a closing `}`. However, you may use `struct foo *` immediately after these tokens to declare a member of the struct. – Jens May 24 '13 at 18:56
3

you can not put a whole struct inside of itself because it would be infinitely recursive.

but you can put the address of another instance of a struct inside of a struct; which is what a pointer is... an address the size of SomeStruct* is always the same, so the compiler knows how much memory to make for one instance of the struct.

Grady Player
  • 14,399
  • 2
  • 48
  • 76
1

This would cause the size to be infinite. Think about the struct:

struct my_struct {
  int x;
  struct my_struct y;
}

This would have a size:

sizeof(struct my_struct) >= sizeof(int) + sizeof(struct my_struct);

Which clearly is not solvable.

However, a pointer to a struct would not have this problem.

struct my_struct2 {
  int x;
  struct my_struct2* y;
}

As the size is now possible.

sizeof(struct my_struct2) >= sizeof(int) + sizeof(struct my_struct2*);

You should understand the difference between

sizeof(struct my_struct2)
and
sizeof(struct my_struct2*)
robbie_c
  • 2,428
  • 1
  • 19
  • 28