51

In one C++ open source project, I see this.

struct SomeClass {
  ...
  size_t data_length;
  char data[1];
  ...
}

What are the advantages of doing so rather than using a pointer?

struct SomeClass {
  ...
  size_t data_length;
  char* data;
  ...
}

The only thing I can think of is with the size 1 array version, users aren't expected to see NULL. Is there anything else?

Calmarius
  • 18,570
  • 18
  • 110
  • 157
Russell
  • 3,975
  • 7
  • 37
  • 47
  • 8
    This has a lot of potential benefits over a pointer, but none that I can think of over a plain `char`. – Fred Foo Jun 17 '11 at 18:47
  • Maybe a duplicate of http://stackoverflow.com/questions/1704407/what-is-the-difference-between-char-s-and-char-s-in-c? – Yet Another Geek Jun 17 '11 at 18:47
  • 7
    That is the ISO C90 idiom for a variable length structure. You are right insofar as a pointer would do the same, but this one allows for a more comfortable access. In ISO C99 you would use empty brackets instead. A one-element array is valid for C++ too, I'm not sure whether that is true for empty brackets as well (though gcc does support it anyway). – Damon Jun 17 '11 at 18:47
  • @larsman: The benefit is if the memory allocation is larger than `sizeof(SomeClass)`, so you can use `data` as an array of a size larger than 1. – Mike Seymour Jun 17 '11 at 18:49
  • @Mike: I'm aware of that. It's hardly a benefit. – Fred Foo Jun 17 '11 at 18:51
  • 1
    @larsman: it is if you want a variable sized array without the overhead of a second memory allocation. – Mike Seymour Jun 17 '11 at 18:52
  • @Yet Another Geek, good point. The answer requires knowledge of the same concept but I think I asked it in a different context and might not be considered duplicate. I'll have the community decide. Thanks. – Russell Jun 17 '11 at 18:53
  • 2
    @larsmans: simply stating that something from a standard (flexible array members in C99, or in this case the closest C89 or C++ approximation to them) is "hardly a benefit" might accurately represent your opinion, but probably isn't helpful unless you give reasons why it's a misfeature. I guess you could say, "it's C++, use a vector", to which the response is "interfacing with C APIs", so my question is still "why is this not a benefit in C" and hence, "why are flexible array members a misfeature of C99?" – Steve Jessop Jun 17 '11 at 19:22
  • @Steve: as Johannes notes, this construct leads to undefined behavior; sorry for not being clearer, but you summed it up quite nicely. – Fred Foo Jun 17 '11 at 19:30

6 Answers6

41

With this, you don't have to allocate the memory elsewhere and make the pointer point to that.

  • No extra memory management
  • Accesses to the memory will hit the memory cache (much) more likely

The trick is to allocate more memory than sizeof (SomeClass), and make a SomeClass* point to it. Then the initial memory will be used by your SomeClass object, and the remaining memory can be used by the data. That is, you can say p->data[0] but also p->data[1] and so on up until you hit the end of memory you allocated.

Points can be made that this use results in undefined behavior though, because you declared your array to only have one element, but access it as if it contained more. But real compilers do allow this with the expected meaning because C++ has no alternative syntax to formulate these means (C99 has, it's called "flexible array member" there).

Johannes Schaub - litb
  • 496,577
  • 130
  • 894
  • 1,212
  • @larsmans: Because not all C compilers allow zero-sized arrays. (C89 did not, I believe. `gcc` probably does) – Ben Zotto Jun 17 '11 at 18:50
  • 1
    @larsmans because declaring an array of size zero is not allowed - `char x[0];` is invalid to say in C++ and C, it requires a diagnostic message (error / warning message). – Johannes Schaub - litb Jun 17 '11 at 18:50
  • 9
    If the OP's code example is to be believed, then I don't think this answer is correct. He has "..." after the `char data[1]`, implying that there are more fields following. – Oliver Charlesworth Jun 17 '11 at 18:53
  • Ah, so there can only be one of this array in a class and it has to be at the end too right? The ... following is not exactly more fields, they're just inline function definitions. I didn't think that was relevant. Sorry I wasn't clear. – Russell Jun 17 '11 at 18:55
  • 1
    @Oli that's a good point I didn't notice. Though if you take into account the questioner wasn't aware of the special meaning, he also wasn't aware that this only makes sense for the last member of the struct, so he may just have inserted the "..." at both ends to keep it symmetric. – Johannes Schaub - litb Jun 17 '11 at 18:57
  • @Russell yeah this trick is used only when you have variable space at the end of objects. Like, if you have a string class where the char data follows the main object. Or when only some of your `foo` objects contain additional data and most others don't and you don't want to waste that space for all the others. – Johannes Schaub - litb Jun 17 '11 at 19:00
  • @Johannes: why is it undefined behavior? In, say `data[2]`, the definition of array indexing causes `data` to decay to a `char*`, which is a valid type-pun for that byte in the middle of whatever allocation I've stuffed this struct into, and I can validly do pointer arithmetic on it as such. Assuming the allocation was big enough, of course. Or does the fact that my pointer originally came from an array decay, sort of infect it with undefinedness because I'm "indexing an array"? If so, what if I wrote `*(&data[0]+2)` instead of `data[2]`, so that I'm "using a pointer" instead of "indexing"? – Steve Jessop Jun 17 '11 at 19:39
  • 2
    @Steve because the pointer points to an element of an array of type `char[1]`. So you are technically only allowed to say `data[0]` and `data[1]`. But `data[2]` is undefined behavior (not necessarily because of the dereference, but because `data + 2` is undefined behavior). But in practice, I don't think this matters, so since allowing it is actually useful real compilers do allow it. – Johannes Schaub - litb Jun 17 '11 at 19:43
  • I still disagree that `data+2` is undefined behaviour. `data` is just a `char*` pointer into the middle of some allocation that I made. I'm allowed to do that, and I'm allowed to add 2 to it. `(char*)(pSomeClass->data_length)) + sizeof(size_t) + 2` is allowed, provided the allocation is big enough, because as I understand it my allocation is still an array of char, even though I've created another struct inside it. So why not `data+2`? Now, if `data` was an array of some type other than `char` or `unsigned char`, that would be another matter. – Steve Jessop Jun 17 '11 at 19:49
  • 1
    @Steve unfortunately, the spec is not clear when an array of char starts lifetime and when it stops. The spec says that a non-class type starts lifetime as soon as you have proper memory. But that cannot be true, because then the aliasing rule would be unable to work. The spec has not been fixed, so I will assume the worst: That `int *x = (int*)malloc(2 * sizeof(int)); x + 2;` is currently defined behavior is not specified by the C++ standard. If you know where it is and how it works with aliasing, I would be glad if you explained it. – Johannes Schaub - litb Jun 17 '11 at 19:58
  • 1
    In particular, when it comes to dynamic memory, I think that the common interpretation is that the above `x + 2` "just works", even though the spec is not clear how the assumed array object starts existence (there has been *a lot* discussion lately on usenet what the C++ spec says, without result). But I also think that the common interpretation for *declared* arrays like `char data[2];` or `int data[2]` is that you can only add `0`, `1` and `2` to it. Adding `3` is undefined behavior, regardless of whether or not there may be storage available. – Johannes Schaub - litb Jun 17 '11 at 20:02
  • @Johannes: so for another example, what about `ptr = new char[sizeof(int)*3]; ptr2 = ptr+sizeof(int); new (ptr) int(0); new (ptr2) int(0); ptr += sizeof(int);`. Is that last line undefined behaviour, on the basis that I no longer have an array of chars, and I've exceeded the bounds of my `int` that I placement-newed? If so, does that mean it's also UB to `memcpy` into a `vector`, since this is exactly how vector creates itself? If there's a contradiction in the standard, I'm inclined to assume that the defect resolution falls in the direction of not breaking the whole rest of the standard... – Steve Jessop Jun 17 '11 at 20:04
  • 1
    And yes, for declared arrays I certainly agree, forming the pointer past the end-of-array pointer is UB, and deliberately so. I'm willing to consider that the standard may be flawed to the extent that this other thing is *accidentally* UB, along with presumably a lot of other things that the standard states or implies are legal. But it seems wrong to say, "you can't do that because of a defect in the standard" without really working out what those other lot of things are that you also can't do. If C++ contains a contradiction then, logically, *anything* is UB. – Steve Jessop Jun 17 '11 at 20:06
  • 1
    @Steve fwiw, the C99 rationale gives as a primary reason for introduction of FAMs into C99 "The validity of this construct has always been questionable. In the response to one Defect Report, the Committee decided that it was undefined behavior because the array p->items contains only one item, irrespective of whether the space exists.". Unfortunately, there is a lot of weasel wording in the spec in C++ too. – Johannes Schaub - litb Jun 17 '11 at 20:17
  • I do see your point about *But it seems wrong to say, "you can't do that because of a defect in the standard* though, so I have changed my answer to step back from saying unconditionally that this is undefined behavior. – Johannes Schaub - litb Jun 17 '11 at 20:18
  • Does Windows API use this hack with the `cbClsExtra` member of the `WNDCLASS` field? (Old post sorry) – lost_in_the_source Dec 22 '15 at 01:51
21

This is usually a quick(and dirty?) way of avoiding multiple memory allocations and deallocations, though it's more C stylish than C++.

That is, instead of this:

struct SomeClass *foo = malloc(sizeof *foo);
foo->data = malloc(data_len);
memcpy(foo->data,data,data_len);

....
free(foo->data);
free(foo);

You do something like this:

struct SomeClass *foo = malloc(sizeof *foo + data_len);
memcpy(foo->data,data,data_len);

...
free(foo);

In addition to saving (de)allocation calls, this can also save a bit of memory as there's no space for a pointer and you could even use space that otherwise could have been struct padding.

Lyke
  • 4,575
  • 6
  • 27
  • 26
13

Usually you see this as the final member of a structure. Then whoever mallocs the structure, will allocate all the data bytes consecutively in memory as one block to "follow" the structure.

So if you need 16 bytes of data, you'd allocate an instance like this:

SomeClass * pObj = malloc(sizeof(SomeClass) + (16 - 1));

Then you can access the data as if it were an array:

pObj->data[12] = 0xAB;

And you can free all the stuff with one call, of course, as well.

The data member is a single-item array by convention because older C compilers (and apparently the current C++ standard) doesn't allow a zero-sized array. Nice further discussion here: http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html

Ben Zotto
  • 70,108
  • 23
  • 141
  • 204
12

They are semantically different in your example.

char data[1] is a valid array of char with one uninitialized element allocated on the stack. You could write data[0] = 'w' and your program would be correct.

char* data; simply declares a pointer that is invalid until initialized to point to a valid address.

Ed S.
  • 122,712
  • 22
  • 185
  • 265
  • 3
    I'm not sure how this was ignored. A pointer and a character are not the same thing. – Jay Jun 17 '11 at 19:32
3
  1. The structure can be simply allocated as a single block of memory instead of multiple allocations that must be freed.

  2. It actually uses less memory because it doesn't need to store the pointer itself.

  3. There may also be performance advantages with caching due to the memory being contiguous.

Jonathan Wood
  • 65,341
  • 71
  • 269
  • 466
3

The idea behind this particular thing is that the rest of data fits in memory directly after the struct. Of course, you could just do that anyway.

Puppy
  • 144,682
  • 38
  • 256
  • 465