5

Possible Duplicate:
Is the “struct hack” technically undefined behavior?

Normally accessing an array beyond its end is undefined behavior in C. For example:

int foo[1];
foo[5] = 1; //Undefined behavior

Is it still undefined behavior if I know that the memory area after the end of the array has been allocated, with malloc or on the stack? Here is an example:

#include <stdio.h>
#include <stdlib.h>

typedef struct
{
  int len;
  int data[1];
} MyStruct;

int main(void)
{
  MyStruct *foo = malloc(sizeof(MyStruct) + sizeof(int) * 10);
  foo->data[5] = 1;
}

I have seen this patten used in several places to make a variable length struct, and it seems to work in practice. Is it technically undefined behavior?

Community
  • 1
  • 1
Tor Klingberg
  • 4,790
  • 6
  • 41
  • 51
  • 1
    Raymond Chen has an article about this patten in Windows titled _[Why do some structures end with an array of size 1?](http://blogs.msdn.com/b/oldnewthing/archive/2004/08/26/220873.aspx)_ – Tor Klingberg Sep 10 '12 at 15:19

3 Answers3

6

What you are describing is affectionately called "the struct hack". It's not clear if it's completely okay, but it was and is widely used.

As of late (C99), it has started to be replaced by the "flexible array member", where you're allowed to put an int data[]; field if it's the last field in the struct.

cnicutar
  • 178,505
  • 25
  • 365
  • 392
  • I can see how it would be a problem if you had a struct like `{ double; char[1]; }`, non? – Kerrek SB Sep 10 '12 at 15:05
  • Do you know of any environment (compiler, platform, runtime library) which does not explicitly support this, or even breaks it? –  Sep 10 '12 at 15:06
  • @KerrekSB I'm not sure I understand why that would be a problem ? – cnicutar Sep 10 '12 at 15:07
  • @delnan It works at least on gcc and on cl. I've seen it used *widely* on the kernel side – cnicutar Sep 10 '12 at 15:08
  • @cnicutar: Well, you're allocating more than you need, and you'd end up writing into the unspecified padding area of the struct, non? – Kerrek SB Sep 10 '12 at 15:13
  • @KerrekSB Well, as long as one writes only to the part beyond the last element, should be ok, no ? – cnicutar Sep 10 '12 at 15:15
  • @cnicutar: I don't know... I think C99 makes zero-length arrays legal, but I don't know the precise semantics. – Kerrek SB Sep 10 '12 at 15:17
  • 1
    @KerrekSB, no zero sized arrays are not allowed in C99 or C11. What is allowed is what cnicutar describes: flexible array members. They are not of size 0 but of an unspecific size. – Jens Gustedt Sep 10 '12 at 15:21
  • 1
    @delnan: The concern is that a C implementation trying to do strict bounds checking on array accesses could break it. However, I believe this concern is mistaken, at least when the array has character type, since the pointer arithmetic involved in accessing past the array size is valid as pointer arithmetic on the *object representation* array (an array of `unsigned char` overlaid with the whole object obtained by `malloc`). Thus, I think it's impossible to break the code while conforming to the C standard. – R.. GitHub STOP HELPING ICE Sep 10 '12 at 16:03
4

Under 6.5.6 Additive operators:

Semantics

8 - [...] If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. [...] If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

If the memory is allocated by malloc then:

7.22.3 Memory management functions

1 - [...] The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated). The lifetime of an allocated object extends from the allocation until the deallocation.

This does not however countenance the use of such memory without an appropriate cast, so for MyStruct as defined above only the declared members of the object can be used. This is why flexible array members (6.7.2.1:18) were added.

Also note that appendix J.2 Undefined behavior calls out array access:

1 - The behavior is undefined in the following circumstances: [...]
— Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that does not point into, or just beyond, the same array object.
— Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that points just beyond the array object and is used as the operand of a unary * operator that is evaluated.
— An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]).

So, as you note this would be undefined behaviour:

  MyStruct *foo = malloc(sizeof(MyStruct) + sizeof(int) * 10);
  foo->data[5] = 1;

However, you would be allowed to do the following:

  MyStruct *foo = malloc(sizeof(MyStruct) + sizeof(int) * 10);
  ((int *) foo)[(offsetof(MyStruct, data) / sizeof(int)) + 5] = 1;

C++ is laxer in this regard; 3.9.2 Compound types [basic.compound] has:

3 - [...] If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained.

This makes sense considered in the light of C's more aggressive optimisation opportunities for pointers, e.g. with the restrict qualifier.

ecatmur
  • 152,476
  • 27
  • 293
  • 366
3

The C99 rationale document talks about this in section 6.7.2.1.

A new feature of C99: There is a common idiom known as the “struct hack” for creating a structure containing a variable-size array:

...

The validity of this construct has always been questionable. In the response to one Defect Report, the Committee decided that it was undefined behavior because the array p->items contains only one item, irrespective of whether the space exists. An alternative construct was suggested: make the array size larger than the largest possible case (for example, using int items[INT_MAX];), but this approach is also undefined for other reasons.

The Committee felt that, although there was no way to implement the “struct hack” in C89, it was nonetheless a useful facility. Therefore the new feature of “flexible array members” was introduced. Apart from the empty brackets, and the removal of the “-1” in the malloc call, this is used in the same way as the struct hack, but is now explicitly valid code.

The struct hack is undefined behavior, as supported not only be the C specification itself (I'm sure there are citations in the other answers) but the committee has even recorded its opinion.

So the answer is yes, it is undefined behavior according to the standard document, but it is well defined according to the de facto C standard. I imagine most compiler writers are intimately familiar with the hack. From GCC's tree-vrp.c:

   /* Accesses after the end of arrays of size 0 (gcc
      extension) and 1 are likely intentional ("struct
      hack").  */

I think there's a good chance you might even find the struct hack in compiler test suites.

Dietrich Epp
  • 205,541
  • 37
  • 345
  • 415