85

I recently read that using flexible array members in C was poor software engineering practice. However, that statement was not backed by any argument. Is this an accepted fact?

(Flexible array members are a C feature introduced in C99 whereby one can declare the last element to be an array of unspecified size. For example: )

struct header {
    size_t len;
    unsigned char data[];
};
maxschlepzig
  • 35,645
  • 14
  • 145
  • 182

7 Answers7

36

It is an accepted "fact" that using goto is poor software engineering practice. That doesn't make it true. There are times when goto is useful, particularly when handling cleanup and when porting from assembler.

Flexible array members strike me as having one main use, off the top of my head, which is mapping legacy data formats like window template formats on RiscOS. They would have been supremely useful for this about 15 years ago, and I'm sure there are still people out there dealing with such things who would find them useful.

If using flexible array members is bad practice, then I suggest that we all go tell the authors of the C99 spec this. I suspect they might have a different answer.

Manos Nikolaidis
  • 21,608
  • 12
  • 74
  • 82
Airsource Ltd
  • 32,379
  • 13
  • 71
  • 75
  • 21
    goto is also useful when we want to implement a recursive implementation of an algorithm using a non recursive implementation in those cases where recursion could raise an additional overhead on the compiler. – pranavk Oct 16 '12 at 18:12
  • 12
    @pranavk You should probably be using `while`, then. – yyny Oct 17 '16 at 22:35
  • 10
    Network programming is another, you have the header as a struct, and the packet(or what it is called in the layer you in..) as the flexible array. Calling the next layer, you strip of the header, and pass the packet. Do this for each layer in the network stack. (You case the data from lower revived from lower layer to struct for layer you are inn) – fhtuft Jan 09 '17 at 09:55
  • 4
    @pranavk `goto` is **not** for loops. – Константин Ван May 10 '20 at 08:17
  • "There are times when goto is useful" See, this is why I sometimes shudder while thinking some kid who's just learning to program will resort to StackOverflow for learning best practices. – Martin Jun 29 '21 at 18:38
  • @Martin I'm not sure I see your point. Are you disagreeing that it's useful? No one is suggesting here that this is best practice. Porting legacy code and best practice rarely go hand-in-hand. Many of the solutions I find on SO to, say, iOS problems, are hacks and certainly not best practice - but often they are the only solution to the problem. – Airsource Ltd Jun 29 '21 at 19:20
  • goto is never useful. Sprinkling additional gotos in legacy code is _especially_ bad. – Martin Jun 30 '21 at 20:09
  • Flexible length array members are used for variable length arrays such that `sizeof(struct header)` is added to `sizeof(unsigned char) * n` before being `malloc()`ed, where `n` is the desired length of data`. Exercise caution, however; API functions have *no idea* how much memory you've allocated, and will readily segfault if you tell them your array is bigger than it actually is – GooseDeveloper Oct 22 '21 at 17:35
  • If you are writing by-hand serialization code, flexible array members come in handy. – dddJewelsbbb Jan 07 '22 at 03:02
  • 1
    `while` and `for` had a lot of semantics under those very terse statements. I would like to see a single yet simple `loop` statement that works with `goto` (in place of "break" and "continue") as a replacement for `while` and `for`. Complex loops will be documented with well-named labels inside a `loop block`; for everything simpler, you just `loop if (predicate_expression) block`, `loop quantifier block`, or `loop block`. And let's not forget the obvious: `goto` jump tables (aka switches with user-defined semantics)! Very useful for directed graph logic programming! – AMDG Jun 12 '22 at 18:22
28

No, using flexible array members in C is not bad practice.

This language feature was first standardized in ISO C99, 6.7.2.1 (16). In the following revision, ISO C11, it is specified in Section 6.7.2.1 (18).

You can use them like this:

struct Header {
    size_t d;
    long v[];
};
typedef struct Header Header;
size_t n = 123; // can dynamically change during program execution
// ...
Header *h = malloc(sizeof(Header) + sizeof(long[n]));
h->n = n;

Alternatively, you can allocate like this:

Header *h = malloc(sizeof *h + n * sizeof h->v[0]);

Note that sizeof(Header) includes eventual padding bytes, thus, the following allocation is incorrect and may yield a buffer overflow:

Header *h = malloc(sizeof(size_t) + sizeof(long[n])); // invalid!

A struct with a flexible array members reduces the number of allocations for it by 1/2, i.e. instead of 2 allocations for one struct object you need just 1. Meaning less effort and less memory occupied by memory allocator bookkeeping overhead. Furthermore, you save the storage for one additional pointer. Thus, if you have to allocate a large number of such struct instances you measurably improve the runtime and memory usage of your program (by a constant factor).

In contrast to that, using non-standardized constructs for flexible array members that yield undefined behavior (e.g. as in long v[0]; or long v[1];) obviously is bad practice. Thus, as any undefined-behaviour this should be avoided.

Since ISO C99 was released in 1999, more than 20 years ago, striving for ISO C89 compatibility is a weak argument.

maxschlepzig
  • 35,645
  • 14
  • 145
  • 182
18

PLEASE READ CAREFULLY THE COMMENTS BELOW THIS ANSWER

As C Standardization move forward there is no reason to use [1] anymore.

The reason I would give for not doing it is that it's not worth it to tie your code to C99 just to use this feature.

The point is that you can always use the following idiom:

struct header {
  size_t len;
  unsigned char data[1];
};

That is fully portable. Then you can take the 1 into account when allocating the memory for n elements in the array data :

ptr = malloc(sizeof(struct header) + (n-1));

If you already have C99 as requirement to build your code for any other reason or you are target a specific compiler, I see no harm.

Remo.D
  • 16,122
  • 6
  • 43
  • 74
  • 1
    The last line should be ptr = malloc(sizeof(header) + n); where n is the length of the string and you use the 1 as terminating \0. – Peter Olsson Oct 29 '08 at 14:38
  • 4
    Thanks. I left the n-1 since it might not be used as a string. – Remo.D Oct 29 '08 at 14:40
  • 1
    use wouldn't care about the sign if it was for sure a string. Regarding this, n-1 is correct. – botismarius Oct 29 '08 at 16:47
  • 87
    The 'following idiom' is not fully portable, which is why flexible array members were added to the C99 standard. – Jonathan Leffler Nov 01 '08 at 00:50
  • 1
    I can say that using this approach does generates big problems. For example if you are using Secure CRT functions and you try to do a `strcpy(data, sometext)` you will get buffer underrun errors at runtime. – sorin Dec 29 '09 at 07:10
  • @sorin not sure what you mean. The problem you're talking about is related to using strcpy() instead of strncpy() the question is about creating arrays that can grow. – Remo.D Jan 06 '10 at 14:39
  • @Jonathan. Sorry I don't get why this is not portable, could you clarify better? – Remo.D Jan 06 '10 at 14:41
  • 4
    Jonathan's point is echoed by the committee (or at least some members), but it contradicts the facts. When considered together, several other parts of the standard **require** the old `[1]` trick to work just as well as `[]` (aside from possibly wasting a few extra bytes of storage). – R.. GitHub STOP HELPING ICE Aug 06 '10 at 15:35
  • 9
    @Remo.D: minor point: the `n-1` does not accurately accounts for the extra allocation, because of alignment: on most 32-bit machines, sizeof(struct header) will be 8 (to remain multiple of 4, since it has a 32-bit field which prefers/requires such alignment). The "better" version is: `malloc(offsetof(struct header, data) + n)` – Tom Leek Aug 26 '12 at 19:54
  • 1
    @Tom: Th1 minus one is there to account the 1 already present in the declaration `data[1]`. Of course malloc() can (and usually will) allocate more memory to comply with the alignment rules or do whatever it needs to do to manage the memory block. – Remo.D Nov 30 '12 at 22:31
  • 32
    In C99 using `unsigned char data[1]` isn't portable because `((header*)ptr)->data + 2` -- even if enough space was allocated -- creates a pointer that points outside the length-1 array object (and not the sentinel one past the end). But per C99 6.5.6p8, "If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; **otherwise, the behavior is undefined**" (emphasis added). Flexible arrays (6.7.2.2p16) act like an array filling the allocated space to not hit UB here. – Jeff Walden Jun 21 '13 at 04:25
  • The situation appears to be similar in the C89 draft here, too: http://port70.net/~nsz/c/c89/c89-draft.html#3.3.6 ( – Jeff Walden Jun 21 '13 at 04:30
  • 2
    @R.. I do not have an actual counter-example to the claim that `[1]` must work as well as `[]` but the first example in this post comes close: GCC (and not even particularly recent versions of it) assume that accesses to an array member remain within the array. It is by courtesy that GCC doesn't assume `q->tab[2]` is an unreachable expression in the second example. http://blog.frama-c.com/index.php?post/2013/07/31/From-Pascal-strings-to-Python-tuples – Pascal Cuoq Aug 01 '13 at 18:45
  • 1
    @PascalCuoq: In the case of char types, the pointer `q->tab` decays to is also a pointer to a part of the representation of the entire object. – R.. GitHub STOP HELPING ICE Aug 01 '13 at 19:40
  • 4
    This idiom is not recommended by [CERT Secure Coding Standards](https://www.securecoding.cert.org/confluence/display/seccode): "[MEMxx-C. Understand how flexible array members are to be used](https://www.securecoding.cert.org/confluence/display/seccode/MEMxx-C.+Understand+how+flexible+array+members+are+to+be+used)" - "The problem with using this approach is that the behavior is undefined when accessing other than the first element of data" – osgx Sep 28 '13 at 18:54
  • 27
    ***WARNING:** Using `[1]` has been shown to cause GCC to generate incorrect code: https://lkml.org/lkml/2015/2/18/407 – Jonathon Reinhart Feb 20 '15 at 19:49
  • @PascalCuoq: I find it interesting how many needless problems could have been avoided simply by having compilers regard a zero-sized array declaration within a struct as forcing an alignment to the array type and a zero-byte allocation. While I despise the way compilers aggressively using UB for assertions, I would think a compiler should be entitled to say that if `struct S` contains `int foo[1]`, a compiler should be allowed to replace `foo[i]` with `foo[0]`; if zero-element arrays were allowed, accessing array elements beyond `[size-1u]` was UB, such substitution would be legal but... – supercat Apr 06 '15 at 16:59
  • ...code which needed arrays to end with a variable amount of data could still do so. Incidentally, I also think there should have been (still should be) a syntax to declare a variable of a type that ends in a size-zero array and specify an amount of extra space for that array [proposed syntax, if `S` has `foo[0]` as its last element: `struct S foo[3]+[5];` would declare an array of three structures, each of which had space for five elements in its `foo` array.] IMHO, that would be much nicer than anything presently allowed with flexible array members. – supercat Apr 06 '15 at 17:03
  • 2
    @supercat that's the start of a very slippery slope – M.M Sep 15 '15 at 15:47
  • @M.M: What's the start of a slippery slope--the replacement of `foo[i]` with `foo[0]`, or the idea that zero-sized objects should be allowed provided programs don't perform arithmetic on pointers to such objects [just as they're forbidden from performing arithmetic on `void*`]? Having a compiler replace `foo[i]` with `foo[0]` would be downright benign compared with hyper-modern optimizations. – supercat Sep 15 '15 at 16:04
  • 1
    @supercat Allowing the code to declare an array of a certain size, and then access out of bounds of the declared size. FAM doesn't count as starting the slope IMO, since the lack of a declared size at all marks it as "special". – M.M Sep 15 '15 at 16:05
  • @M.M: Are you disapproving of the practice of having programmers declare a structure as ending with `dat[1]` but then dereferencing data beyond that, disapproving of allowing such accesses with a zero-element array, or both? I don't like the former, but historically it was rendered necessary by the prohibition against zero-element arrays; I'd say that having compilers not bother yielding an error for zero-element arrays but requiring that arrays be accessed in the range 0 to `size-(size_t)1` would have been simpler and cleaner from both a coding and compiler perspective. – supercat Sep 15 '15 at 16:13
  • @M.M: Actually, one thing I've long wished for in C would be a syntax for indicating that a structure member should not allocate space, but be forced to a certain offset relative to another structure member or the end of the structure. If such a thing were supported for bitfields, it could make them portable (e.g. `uint32_t first_two_fields; int field1 = first_two_fields.0:23; int field2 = first_two_fields.23:9;` would mean that `field1` would occupy the lower 23 bits of `first_two_fields` and `field2` would occupy the upper 9 bits). That would have allowed for useful optimizations... – supercat Sep 15 '15 at 16:18
  • ...in cases where a variable-sized portion of the array will be small enough to allow a simpler addressing mode than would be necessary if array subscripts could be of any size. – supercat Sep 15 '15 at 16:22
  • 3
    @supercat disapproving of both. In C89 they were hacks that were justifiable in some cases. In C99 there is Flexible Array Member which was introduced precisely to give a well-defined tool that renders all of those hacks unnecessary. Old code should be migrated. The F.A.M. has the advantage of being an incomplete type, so it is impossible to accidentally apply `sizeof` to it – M.M Feb 21 '16 at 10:41
  • @M.M: Under C99 or C11, as far as I can tell, it's impossible to declare an object of static or automatic duration which is type-compatible with a structure containing a Flexible Array Member. I agree the FAM is better than the size-1 and size-0 struct hacks, but wish it had been specified better. Among other things, given `struct x {uint32_t x; uint8_t y; uint16_t z[];}` I would have specified that the offset of `z` should be the same as for any fixed size (and all fixed sizes should imply the same offset), and `sizeof (struct x)` should yield the offset of z. – supercat Feb 21 '16 at 23:30
  • 1
    @supercat I think `sizeof(struct x)` does yield the offset. There may be padding before `z` according to the standard, but IMO that is not an issue, as (a) there may be padding almost anywhere and in practice compilers only use padding where required for alignment, and (b) it's easy enough to write code that checks for padding, and/or does not rely on padding's presence – M.M Feb 21 '16 at 23:45
  • @M.M: In cases where the structure would have an alignment of 4, and the "natural" offset for x would be 6 (which is not a multiple of 4), does the Standard indicate that neither the size nor the offset should be rounded up to 8? – supercat Feb 21 '16 at 23:47
  • @supercat AFAIK there is no such requirement; there cannot be an array of structs with flexible array member. gcc does actually give a larger result for `sizeof` than the `offsetof` in a case I tried... I agree that this is lame , however maybe the gcc developers had backwards compaitibility of some sort in mind. gcc does seem to implement array of f.a.m. structs as an extension – M.M Feb 21 '16 at 23:53
  • @M.M: I'm not positive, but I believe the Standard (stupidly IMHO) requires that the length of the incomplete structure be rounded up to its alignment, even though that messes up efforts to compute the size of a structure with some number of elements in the flexible array. I suspect the Standard would allow an implementation to add padding before the FAM to make its offset match the struct length even when a normal array's offset would not have been influenced in such fashion, but that would interfere with what would otherwise be the most sensible way of creating... – supercat Feb 22 '16 at 00:10
  • @supercat I don't see any such requirement; it just says that there might be trailing padding (not that there must be) – M.M Feb 22 '16 at 00:16
  • ...a static or automatic item containing an FAM when using dialects of C which use a relaxed version of C99's type rules [i.e. declare a structure which is identical except that it has a fixed-sized array, and then alias the pointer]. Actually, what might have been best would have been to define a syntax for "struct foo(x) {int blah; char y; short dat[x];}" and then say that a "struct foo(3)" will be a "struct foo" where the final array has a size of 3. A platform could then say that all "struct foo(N)" will have the same offset for "dat" and will be alias-compatible, even if... – supercat Feb 22 '16 at 00:18
  • ...neither assumption would hold for independent structures with different fixed array sizes. I have no problem with the idea that when programming constructs exist to do things that programmers need to do, programmers should use such structures rather than nasty hacks. On the other hand, I do have a problem with the idea that a language should forbid hacks to do things which need to be done, and for which the language provides no non-hacky alternative. – supercat Feb 22 '16 at 00:20
  • 5
    **WARNING**: using `[1]` will result in bound violation reports when using `gcc -mmpx -fcheck-pointer-bounds`. – Lekensteyn Sep 02 '16 at 21:13
  • 2
    @Remo.D Why after 11 years have you still not revised or deleted this answer? As other comments have pointed out, using `foo[1]` instead of `foo[]` isn't just wrong in that it *isn't* portable, it is **dangerously** wrong, as it causes undefined behavior -- making compilers silently generate incorrect/unintended bytecode. – Will Jun 17 '19 at 07:26
  • 1
    @Will. Because I believe that, together witha all these comments, it is still instructional. I added a note to remind people to read the comments carefully. – Remo.D Jun 17 '19 at 09:09
12

You meant...

struct header
{
 size_t len;
 unsigned char data[];
}; 

In C, that's a common idiom. I think many compilers also accept:

  unsigned char data[0];

Yes, it's dangerous, but then again, it's really no more dangerous than normal C arrays - i.e., VERY dangerous ;-) . Use it with care and only in circumstances where you truly need an array of unknown size. Make sure you malloc and free the memory correctly, using something like:-

  foo = malloc(sizeof(header) + N * sizeof(data[0]));
  foo->len = N;

An alternative is to make data just be a pointer to the elements. You can then realloc() data to the correct size as required.

  struct header
    {
     size_t len;
     unsigned char *data;
    }; 

Of course, if you were asking about C++, either of these would be bad practice. Then you'd typically use STL vectors instead.

Roddy
  • 66,617
  • 42
  • 165
  • 277
  • 2
    provided that you are coding on a system where STL is supported! – Airsource Ltd Oct 29 '08 at 14:36
  • 7
    C++ but no STL... That's not a pleasant thought! – Roddy Oct 29 '08 at 14:39
  • 9
    Name one compiler that accepts zero-length arrays. (If the answer was GCC, now name another.) It is not sanctioned by the C standard. – Jonathan Leffler Nov 01 '08 at 00:51
  • 3
    I've worked in a C++ but no STL environment - we had our own containers which provided the commonly used functionality without the full generality of the STL iterator system. They were easier to understand and had good performance. However, this was in 2001. – pjc50 May 20 '09 at 14:35
  • @JonathanLeffler Accepted by GCC and Clang, which covers two out of the three main compilers in use today. (MSVC is the other big one, and that's only really relevant on one — admittedly very common — platform.) – Donal Fellows Apr 28 '20 at 11:18
  • @JonathanLeffler: Many compilers accepted the construct before the Standard broke it, since processing the construct was not only easier and more useful than processing C99-style flexible array members, but it was easier than gratuitously rejecting such useful constructs. – supercat Mar 22 '23 at 22:59
6

I've seen something like this: from C interface and implementation.

  struct header {
    size_t len;
    unsigned char *data;
};

   struct header *p;
   p = malloc(sizeof(*p) + len + 1 );
   p->data = (unsigned char*) (p + 1 );  // memory after p is mine! 

Note: data need not be last member.

Nyan
  • 2,360
  • 3
  • 25
  • 38
  • 17
    Indeed this has the advantage that `data` need not be the last member, but it also incurs an extra dereference every time `data` is used. Flexible arrays replace that dereference with a constant offset from the main struct pointer, which is free on some particularly common machines and cheap elsewhere. – R.. GitHub STOP HELPING ICE Aug 06 '10 at 15:39
  • @R.. Although, considering the target address is *necessarily* the byte directly after the pointer, it is approximately 100% guaranteed to already be in L1 cache, giving the entire dereference something like half a cycle of overhead. However, the point stands that flexible arrays are a better idea here. – pmttavara Apr 01 '18 at 11:28
  • 1
    With `unsigned char *`, `p->data = (unsigned char*) (p + 1 )` is OK. Yet with `double complex *`, `p->data = (double complex *) (p + 1 )` may cause alignment problems. – chux - Reinstate Monica May 16 '18 at 17:21
  • 1
    This answer is technically irrelevant, as it _does something different_ (it lays out the data differently in memory). While the pattern it describes is often useful, that doesn't mean that it can be a replacement for the other. – Donal Fellows Apr 28 '20 at 11:15
5

There are some downsides related to how structs are sometimes used, and it can be dangerous if you don't think through the implications.

For your example, if you start a function:

void test(void) {
  struct header;
  char *p = &header.data[0];

  ...
}

Then the results are undefined (since no storage was ever allocated for data). This is something that you will normally be aware of, but there are cases where C programmers are likely used to being able to use value semantics for structs, which breaks down in various other ways.

For instance, if I define:

struct header2 {
  int len;
  char data[MAXLEN]; /* MAXLEN some appropriately large number */
}

Then I can copy two instances simply by assignment, i.e.:

struct header2 inst1 = inst2;

Or if they are defined as pointers:

struct header2 *inst1 = *inst2;

This however won't work for flexible array members, since their content is not copied over. What you want is to dynamically malloc the size of the struct and copy over the array with memcpy or equivalent.

struct header3 {
  int len;
  char data[]; /* flexible array member */
}

Likewise, writing a function that accepts a struct header3 will not work, since arguments in function calls are, again, copied by value, and thus what you will get is likely only the first element of your flexible array member.

 void not_good ( struct header3 ) ;

This does not make it a bad idea to use, but you do have to keep in mind to always dynamically allocate these structures and only pass them around as pointers.

 void good ( struct header3 * ) ;
Chef Gladiator
  • 902
  • 11
  • 23
wds
  • 31,873
  • 11
  • 59
  • 84
5

As a side note, for C89 compatibility, such structure should be allocated like :

struct header *my_header
  = malloc(offsetof(struct header, data) + n * sizeof my_header->data);

Or with macros :

#define FLEXIBLE_SIZE SIZE_MAX /* or whatever maximum length for an array */
#define SIZEOF_FLEXIBLE(type, member, length) \
  ( offsetof(type, member) + (length) * sizeof ((type *)0)->member[0] )

struct header {
  size_t len;
  unsigned char data[FLEXIBLE_SIZE];
};

...

size_t n = 123;
struct header *my_header = malloc(SIZEOF_FLEXIBLE(struct header, data, n));

Setting FLEXIBLE_SIZE to SIZE_MAX almost ensures this will fail :

struct header *my_header = malloc(sizeof *my_header);
diapir
  • 2,872
  • 1
  • 19
  • 26