1

I see this in code sometimes:

struct S
{
    int count; // length of array in data
    int data[1];
};

Where the storage for S is allocated bigger than sizeof(S) so that data can have more space for its array. It is then used like:

S *s;

// allocation

s->data[3] = 1337;

My question is, why is data not a pointer? Why the length-1 array?

Oliver Zheng
  • 7,831
  • 8
  • 53
  • 59
  • Doesn't `int data[1]` represent a `int *data`? – phimuemue Sep 30 '10 at 17:14
  • Are you saying the allocation is something like `s=malloc(sizeof(S)+sizeof(int)*4)` to create an array of size 5? – Dusty Sep 30 '10 at 17:16
  • 2
    @phimuemue: No, it doesn't. Array is not a pointer. – AnT stands with Russia Sep 30 '10 at 17:25
  • 1
    Related question: [Is the "struct hack" technically undefined behavior?](http://stackoverflow.com/questions/3711233/is-the-struct-hack-technically-undefined-behavior) – Kristopher Johnson Sep 30 '10 at 18:19
  • 1
    A *variable length array* (VLA) is something completely different: An array whose size is not an constant integer expression. You probably mean *flexible array member*, the legal successor to the dubious struct hack. – schot Sep 30 '10 at 20:21

6 Answers6

9

If you declare data as a pointer, you'll have to allocate a separate memory block for the data array, i.e. you'll have to make two allocations instead of one. While there won't be much difference in the actual functionality, it still might have some negative performance impact. It might increase memory fragmentation. It might result in struct memory being allocated "far away" from the data array memory, resulting in the poor cache behavior of the data structure. If you use your own memory management routines, like pooled allocators, you'll have to set up two allocators: one for the struct and one for the array.

By using the above technique (known as "struct hack") you allocate memory for the entire struct (including data array) in one block, with one call to malloc (or to your own allocator). This is what it is used for. Among other things it ensures that struct memory is located as close to the array memory as possible (i.e. it is just one continuous block), so the cache behavior of the data structure is optimal.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • 1
    It's not necessary to make two allocations, if `data` is a pointer. One can allocate memory like this: `malloc(sizeof(S) + N*sizeof(int))` and set the pointer to: `s.data = (int*)(&s+1)`. The problem is that the pointer has to be set and accessing `data` requires one more level of indirection. – Maciej Hehl Sep 30 '10 at 18:38
  • @Maciej Hehl: Yes, it is possible to do it that way as well. It is often done that way when you have *multiple* variable length arrays in the struct. However, it is not as easy as you suggest. In general you have to make sure that the array memory is properly aligned. Your `(int *) (ptr_to_s + 1)` does not ensure that. Two allocations would solve that problem automatically. "Struct hack" does as well. But with pointer and one allocation it is your responsibility to take care of alignment. – AnT stands with Russia Sep 30 '10 at 18:40
  • In this case (array of ints and a `struct` consisting of an `int` and `int` pointer) the memory should be aligned properly. In a general case, yes alignment might be a problem, but "struct hack" doesn't solve the problem automatically. The problem is solved if the compiler supports an extension, that allows to write `int data[];` or `int data[0];` in the `struct` definition. If it doesn't, a `struct` with the last field like: `char data[1];` can be padded at the end and it has to be taken into account. – Maciej Hehl Sep 30 '10 at 18:58
  • @Maciej Hehl: Er... But that's how "struct hack" is implemented: by adding `int data[1]` at the end of the struct, or `int data[]` in C99. What you see in the OP *is* the "struct hack". So, "struct hack" *does* solve the alignment problem automatically. And there's no need for any "compiler extension" to add `int data[1]` at the end of the struct. – AnT stands with Russia Sep 30 '10 at 19:05
  • 1
    You must be thinking about the "braindead" version of struct hack where 0-sized array is used instead of 1-sized array - that would indeed require a compiler extension. But nobody forces you to use that strange 0-sized version. Just do it properly and use 1-sized array in C89/90 and size-less array declaration in C99. – AnT stands with Russia Sep 30 '10 at 19:07
  • I have a 32 bit system. In my implementation, If I add `char data[1];` at the end of the `struct` that also contains an `int` field, `data` is not "at the end". It's right after the `int` field and there are 3 padding bytes **after** `data`, so the size of the `struct` is 8. – Maciej Hehl Sep 30 '10 at 19:26
  • 1
    @Maciej Hehl: So what? What is the importance of that example? Where do you see the problem? – AnT stands with Russia Sep 30 '10 at 19:47
  • I see the problem in the fact, that when I say `malloc(sizeof(S) + N)` in the case described above, there are 3 bytes, that don't belong to me, between `data[0]` and the rest of allocated memory. I feel uncomfortable accessing those padding bytes. Maybe it's not a problem at all, since we are in the hack territory anyway, but I don't know about it yet. That's what I mean, when I say, that alignment problem isn't solved by the hack, because I think I still have to think about those padding bytes. If I'm wrong, thank you for the discussion and the information. – Maciej Hehl Sep 30 '10 at 20:05
  • @Maciej Hehl: In the case described above we'd allocate memory as `malloc(offsetof(S, data) + N * sizeof(int))`. And then access `data` as an array from `0` to `N-1`. There's a hack in this, all right, but that's why it is called "struct hack". – AnT stands with Russia Sep 30 '10 at 21:13
  • 1
    @Maciej also if you do that pointer version, simply memcpy the struct to another allocated block won't do it. You will have to adjust the pointer manually for each copy you do, and writing out the memory block to the network/a file will include writing out the bytes of the pointer, which is usually not desired. – Johannes Schaub - litb Sep 30 '10 at 23:46
6

Raymond Chen wrote an excellent article on precisely why variable length structures chose this pattern over many others (including pointers).

He doesn't directly comment on why a pointer was chosen over an array but Steve Dispensa provides some insight in the comments section.

From Steve

typedef struct _TOKEN_GROUPS { 
DWORD GroupCount; 
SID_AND_ATTRIBUTES *Groups; 
} TOKEN_GROUPS, *PTOKEN_GROUPS; 

This would still force Groups to be pointer-aligned, but it's much less convenient when you think of argument marshalling.

In driver development, developers are sometimes faced with sending arguments from user-mode to kernel-mode via a METHOD_BUFFERED IOCTL. Structures with embedded pointers like this one represent anything from a security flaw waiting to happen to simply a PITA.

JaredPar
  • 733,204
  • 149
  • 1,241
  • 1,454
1

It's done to make it easier to manage the fact that the array is sequential in memory (within the struct). Otherwise, after the memalloc that is greater than sizeof(S), you would have to point 'data' at the next memory address.

Jose
  • 311
  • 1
  • 3
1

Because it lets you have code do this:

struct S
{
    int count; // length of array in data
    int data[1];
};

    struct S * foo;

    foo = malloc(sizeof(struct S) + ((len - 1)*sizeof(int)) );
    strcpy(foo->data, buf);

Which only requires one call to malloc and one call to free.

This is common enough that the C99 standard allows you do not even specify a length of the array. It's called a flexible array member.

From ISO/IEC 9899:1999, Section 6.7.2.1, paragraph 16: "As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member." called a flexible array member."

struct S
{
    int count; // length of array in data
    int data[];
};

And gcc has allowed 0 length array members as the last members of structs as an extension for a while.

nategoose
  • 12,054
  • 27
  • 42
0

Because of different copy semantics. If it is a pointer inside, then the contents have to explicitly copied. If it is a C-style array inside, then the copy is automatic.

Arun
  • 19,750
  • 10
  • 51
  • 60
  • The OP is obviously describing a "struct hack" ("...where the storage for S is allocated bigger than sizeof(S)..."). You can't use built-in copying (i.e. assignment) to copy "struct hack"-ed structs. You have to copy them explicitly (with `memcpy` or something like that). – AnT stands with Russia Sep 30 '10 at 19:10
0

Incidentally, I don't think there's any guarantee that using a length-one array as something longer is going to work. A compiler would be free to generate effective-address code that relies upon the subscript being no larger than the specified bound (e.g. if an array bound is specified as one, a compiler could generate code that always accesses the first element, and if it's two, on some platforms, an optimizing compiler might turn a[i] into ((i & 1) ? a[1] : a[0]). Note that while I'm unaware of any compilers that actually do that transform, I am aware of platforms where it would be more efficient than computing an array subscript.

I think a standards-compliant approach would be to declare the array as [MAX_SIZE] and allocate sizeof(struct S)-(MAX_SIZE-len)*sizeof(int) bytes.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • See http://stackoverflow.com/questions/3711233/is-the-struct-hack-technically-undefined-behavior for more on this. – Kristopher Johnson Sep 30 '10 at 18:18
  • @Kristopher Johnson: I wrote the last answer there, but it got no comments. None of the other comments seemed to mention why a compiler might have trouble with the struct hack, but it would seem entirely reasonable for a compiler to generate code which assumes an array will only be indexed within the specified bounds. I wish the authors of the standard had allowed zero-length arrays, since they could be interpreted as the one situation where the struct hack was permissible. – supercat Sep 30 '10 at 22:12