Embedded versus pointer in nested String-like structures

Question

When designing structures to contain textual data, I have been using two basic approaches illustrated below:

typedef struct {
    STRING address1;
    STRING address2;
    STRING city;
    STRING state;
    STRING zip;
} ADDRESS;

typedef struct {
    STRING* address1;
    STRING* address2;
    STRING* city;
    STRING* state;
    STRING* zip;
} ADDRESS;

where STRING is some variable length string-storing type. The advantage of the pointer version is that I can store NULL indicating that data is missing. For example, address2 might be not provided for some addresses. In the type with embedded STRINGs, I have to use a "blank" string, meaning one that has 0 length.

With the pointers there is (possibly) more code burden because I have to check every member for NULL before using. The advantage is not that great, however, because usually the embedded version has to be checked too. For example, if I am printing an address, I have to check for a zero-length string and skip that line. With pointers the user can actually indicate they want a "blank" versus a missing value, although it is hard to see a use for this.

When creating or freeing the structure, pointers add a bunch of additional steps. My instinct is to standardize on the embedded style to save these steps, but I am concerned that there might be a hidden gotcha. Is this an unwarranted fear, or should I be using the pointers for some compelling reason?

Note that memory use is an issue, but it is pretty minor. The pointer version takes a little bit more memory because I am storing pointers to the structs in addition to the structs. But each string struct takes maybe 40 bytes on average, so if I am storing 4 byte pointers, then the pointer version costs maybe 10% more memory which is not significant. Having null pointers possible does not save significant memory because most fields are populated.

Question is About ADDRESS not STRING

Some of the respondents seem to be confused and think I am asking about global tradeoffs, like how to minimize my total work. That is not the case. I am asking about how to design ADDRESS, not STRING. The members of address could have fixed arrays, or in other cases not. For the purposes of my question, I am not concerned about are the consequences for the container.

I have already stated that the only issue I can see is that it costs more time to use pointers, but I get the benefit of being able to store a NULL. However, as I already said, that benefit does not seem to be significant, but maybe it is for some reason. That is the essence of my question: is there some hidden benefit of having this flexibility that I am not seeing and will wish I had later on.

If you don't understand the question, please read the preliminary answer I have written myself below (after some additional thought) to see the kind of answer I am looking for.

I'm guessing that `STRING` is a `struct` with members to keep track of how much memory has been allocated/used, and a pointer to said memory. But both answers assume that `STRING` is a `typedef` for a fixed length array. You may want to clear that up. — user3386109, Apr 18 '19 at 11:01
A practical answer to this question depends on how the `STRING` structure is defined. — Mark Benningfield, Apr 18 '19 at 14:11
Sometimes I use structs that have fixed length arrays, other times I use embedded structures that have pointers malloced arrays. Assume that STRING could have either. I am asking about the design of the container here, not the contents. — Tyler Durden, Apr 18 '19 at 14:12
Because the memory considerations amount to a difference of only a few bytes in either case. The practical consideration is how cumbersome it is to manage the lifecycle of an instance of this type. — Mark Benningfield, Apr 18 '19 at 14:15
@MarkBenningfield Okay, that is an answer, but I don't see how that answer changes depending on what is inside STRING. If I have to free stuff inside of the STRING members, then I still have to do that work no matter which of the two designs I use. — Tyler Durden, Apr 18 '19 at 14:18
That's the whole point. The question is whether or not you "have to free stuff inside of the STRING members". — Mark Benningfield, Apr 18 '19 at 14:23
@MarkBenningfield No, that is beside the point. My question is how to design ADDRESS, not how to design STRING. I am not asking how to minimize my total work, I am asking whether there are hidden gotchas with one design of ADDRESS or the other. The fact that I might have to free stuff inside of STRING is not a hidden gotcha, that is something I am well aware of. — Tyler Durden, Apr 18 '19 at 14:28

score 2 · Answer 1 · answered Apr 18 '19 at 10:58

2

Tradeoffs over memory usage and reduction in mallocs

Seems like the tradeoffs center around two questions: 1) How precious is memory? and 2) Does it matter that a fixed amount of memory is allocated for the strings, limiting the lengths to be stored in each field?

If memory is more important than anything else, then the pointer version probably wins. If predictability of storage usage and avoidance of mallocs is preferred, and limiting the length of the names to some fixed amount is acceptabe, then the fixed length version may be the winner.

answered Apr 18 '19 at 10:58

Gardener

2,591
1
13
22

The memory issues I think are pretty minor. I will add a comment on that to my question. – Tyler Durden Apr 18 '19 at 14:01
Note that the pointer version does not necessarily save memory because the savings from the occasional null pointer are made up by the cost of storing the pointers themselves. So, for example, in my given cases there are 5 fields so I must store 5 pointers which might cost 5 x 8 = 40 additional bytes compared to the embedded version. – Tyler Durden Apr 18 '19 at 14:08
1

Until you reveal how STRING is implemented, we are like a bunch of friends playing Dungeons and Dragons, waiting for the Dungeon Master to enlighten us as to the true nature of the world in which we are traipsing. :-) – Gardener Apr 18 '19 at 14:38
Nice analogy, but I recommend reading the question more carefully. If you think the contents of STRING would affect the answer, then you have not understood the question. – Tyler Durden Apr 18 '19 at 14:41
I tend to disagree. Two other members with reputations much higher than mine said the same thing. In fact, both of them said that I had made "assumptions" that were invalid in my answer. – Gardener Apr 18 '19 at 14:46
I have started to figure this problem out and have written an answer of the type I was expecting. – Tyler Durden Apr 18 '19 at 15:49

doron · Answer 2 · 2019-04-18T12:07:26.100

0

One problem with the embedded style is that STRING needs to be defined as something like char[MAX_CHAR + 1] where MAX_CHAR is a pessimistic maximum length for the given fields. The pointer style allows one to allocate the correct amount of memory. The downside as you mention is a much higher cognitive overhead managing your struct.

edited Apr 18 '19 at 12:07

answered Apr 18 '19 at 10:54

doron

27,972
12
65
103

2

_"...STRING needs to be defined as `char[MAX_CHAR + 1]` ..."_: Not necessarily, so far we absolutely no idea how the STRING type is implemented. – Jabberwocky Apr 18 '19 at 11:19
The point I am making is that each string item will need to have a pessimistic max length. – doron Apr 18 '19 at 12:08
1

Why? We have no clue how `STRING` is implemented. – Jabberwocky Apr 18 '19 at 12:40
In most cases the STRING structure has pointers inside of it, so it has a flexible memory structure, but this is irrelevant to the question. My question revolves around how to design the container. – Tyler Durden Apr 18 '19 at 14:10
OK I misunderstood the OP's question – doron Apr 18 '19 at 14:56

score -2 · Answer 3 · answered Apr 18 '19 at 15:47

I have been considering this more deeply and I think that in most cases pointers are necessary because it is important to discriminate between blank and missing. The reason for this is that the missing data is needed when an input is invalid or corrupt or left out. For example, let's imagine that when reading from a file, the file is corrupted so a field like zip code is unreadable. In that case the data is "missing" and the pointer should be NULL. On the other hand, let's imagine that the place has no zip code, then it is "blank". So, NULL means that the user has not yet provided information, but blank means the user has provided the information and there is none of the type in question.

So, to further illustrate the importance of using the pointer, imagine that a complex structure is populated over time in different, asynchronous steps. Here we need to know what fields have been read, and which ones have not. Unless we are using pointers (or adding additional metadata), we have no way of telling the difference between a field that has been answered and one for which the answer is "none". Imagine the system prompting the user "what is the zip code?". User says, "this place has no zip code". Then 5 minutes later the system asks again, "what is the zip code?". That use case makes it clear that we need pointers in most cases.

In this light, the only situation where I should use embedded structs is when the container structure is guaranteed to have a full set of data whenever it is created.

The distinction between "empty" and "missing" should be in the STRING itself. That way, the STRING (by itself) can be passed to a function without losing the information about whether the string is present, empty, or missing. In other words, a `STRING` should be self-contained. Whether the container uses pointers to STRINGs or embedded STRINGs is completely unrelated to the problem of how to track a missing string. — user3386109, Apr 18 '19 at 21:28
@user3386109 I don't want to add that kind of metadata to my string container. — Tyler Durden, Apr 18 '19 at 21:33

Embedded versus pointer in nested String-like structures

3 Answers3