Data design: better to nest structures or pointers to structures?

Question

Working in plain C, is it better to nest structures inside other structures or pointers to structures. Using pointers makes it easier to have good alignment, but then accessing the inner structures requires an additional dereference. Just to put this in concrete terms:

typedef struct {
        unsigned int length;
        char* string;
} SVALUE;

typedef struct {
        unsigned int key;
        SVALUE* name;
        SVALUE* surname;
        SVALUE* date_of_birth;
        SVALUE* date_of_death;
        SVALUE* place_of_birth;
        SVALUE* place_of_death;
        SVALUE* floruit;
} AUTHOR;

typedef struct {
        SVALUE media_type;
        SVALUE title;
        AUTHOR author;
} MEDIA;

Here we have some nested structures, in some cases nesting pointer to the internal structure and in others embedding the structure itself.

One issue besides alignment and dereferencing is how memory is allocated. If I do not use pointers, and use pure nested structures, then when the instance of the structure is allocated, the entire nested tree is allocated in one step (and must also be freed in one step). However, if I use pointers, then I have to allocate and free the inner members separately, which means more lines of code but potentially more flexibility because I can, for example, leave members null if the record has no value for that field.

Which approach is preferable?

Depends on what you are trying to do. You have correctly analyzed the major pros and cons already. Both ways are valid. There is no single right answer to this question without a specific problem in mind. — Mad Physicist, Oct 16 '14 at 21:37

Matheus Moreira · Answer 1 · 2014-10-16T22:00:40.300

Nesting structures ensures their spatial locality, since the entire object is actually just a big block of memory even though it is made up of several structures; in memory, the tree is flattened and all members are stored contiguously. This might result in better use of fast memory such as processor caches. If you nest pointers to other structures, this level of indirection might mean the nested data is stored in a far away location, which might prevent such optimizations; by dereferencing the pointer the data would have to be fetched from main memory. Directly nesting data also simplifies access of structure members for purposes such as serialization and transmission.

It also has other implications, such as the impact on the size of your structure and the effects of passing its objects around by value. If you directly nest structures, the sizeof your structure will likely be much bigger than if you had nested pointers. Bigger structures have a larger memory footprint, which can grow noticeably if copies are being made all the time. If the objects are not opaque, they can be allocated on the stack and quickly overflow it. The larger the struct, the more fitting they are for dynamic allocation and indirect access through pointers. I speculate that copying around big amounts of data also carries a cost in speed, but I'm not sure.

Pointers provide additional semantics which may or may not be desirable in your case. They:

Can be NULL, indicating that the data is not available or is possibly optional
Create links between separate structures and allow one structure to exist without the other
Allow two different structures to be allocated differently and to have distinct lifetimes
Allow many different structures to share one possibly big common nested value without wasting memory
Let you to point to data which has not even been properly defined yet
- You can point to opaque structures, which cannot be instantiated in the stack because the compiler does not yet know their size

AnT stands with Russia · Answer 2 · 2014-10-16T21:54:43.040

There are too many factors involved in making such decisions. Most of the time it is not a matter of preference. It is a matter of ownership, lifetime and memory management.

Every object "lives" somewhere and is owned by someone/something. Whoever owns an object, has control over its lifetime, among other things. Everybody else can only refer to that object through pointers.

When a struct object is directly nested into another struct object, the nested object is owned by the object it is nested into. In your example each MEDIA object owns its media_type, title and author subobjects. They begin their lives together with their owning MEDIA object and they die together with that object.

Meanwhile, at the first sight AUTHOR object does not own its name, surname and other subobjects. AUTHOR object simply refers to those subobjects. name, surname and other SVALUE subobjects live somewhere else, they are owned by someone/something else, they are managed by someone/something else.

At the first sight, it looks like a strange design. Why doesn't AUTHOR own its name? One possible reason for that is that we are dealing with a database where many authors have identical names, surnames etc. In that case to save memory it might make sense to store these SVALUE objects in some external container (hash set, for example), which keeps only one copy of each specific SVALUE. Meanwhile, AUTHOR objects simply refer to those SVALUE objects. I.e. all AUTHOR objects with name "John" will refer to the same SVALUE "John".

In such case it is that hash set that owns these SVALUE objects.

But if AUTHOR is actually supposed to own its name, yet a pointer is used just to have an opportunity to leave it null... this does not strike me as a particularly good design, especially considering that SVALUE object already has its own capacity for representing null values. Unless you are looking at significant memory savings from the ability to leave some fields null, it would be a better idea to store name directly in AUTHOR.

Now, if you don't need any sort of cross-referencing between different data structures, then you simply don't need pointers. In other words, if the object is only known to its owner and no one else, then using pointers and allocating sub-objects independently make very little sense. In such cases it makes much more sense to nest structures directly.

On the other hand, some designs might not allow you to nest objects directly. Such designs might declare opaque struct types, which can only be instantiated through an API allocator function returning a pointer. In such designs your are forced to use pointers. But this is not the case in your example, I believe.

@Mad Physicist: The question is not sufficiently specific to allow such a definitive conclusion as "This is not an answer to the question". I think it does answer the question to the degree the question can be considered valid. — AnT stands with Russia, Oct 16 '14 at 21:38
I did not flag or vote you down exactly for that reason. I did flag the question though. It is basically meaningless since both syntaxes are perfectly valid and a situation can always be constructed where one is better than the other. — Mad Physicist, Oct 16 '14 at 21:48
@Mad Physicist: At the syntax level both are indeed valid. But this is not mean to be a syntax question. This is a data structure design question, as it clearly states in the title. — AnT stands with Russia, Oct 16 '14 at 21:50
Without a specific example the question is still meaningless. It is similar to asking "is list better than vector?" Both types are provided because they are useful in different situations. — Mad Physicist, Oct 16 '14 at 21:54

Data design: better to nest structures or pointers to structures?

2 Answers2